Panning for meaning in unit tests

I recently made a code change, and also made the corresponding changes to the unit tests. Once that was sorted and tided away in a commit, I spent another commit refactoring the unit tests. As I was refactoring, I realised that the motivation behind the refactoring, i.e. what was influencing its direction, was a desire to make the meaning of the tests clearer. It was like there was already meaning there, but it was hidden amongst the details of the tests. Or, more poetically, I was panning for meaning in the unit test code, like someone panning for gold in the silt and grit of a river.

One of the differences between code and poetry is that in code you don’t want ambiguity, or layers of meaning packed tightly into the text, where some of the enjoyment comes from gradually extracting more and more meaning on careful and lengthy study. It’s more like a military officer issuing an order – something direct and quick to understand.

A man panning for gold in a river — Image credit

Making things a bit more concrete

That’s all very lovely, but what does it mean? How could you do this if you wanted to? I go into the details of this in a few existing posts, so I’ll link to them but then skip over the details:

The tests needed quite a bit of tidying up, but I wanted to minimise the risk of breaking anything, so I cleaned up in small stages, moving from one working version of the tests to a slightly tidier but still working version of the tests.

Constants

The first stage, because it was small and very low risk, was addressing magic constants. Why is 1 passed to this method, and not 17? Why is the test checking that the result is “gold”? Does it have any relation to this other thing being passed “gold”? So I defined named constants wherever that made sense, and used them instead of the strings and numbers.

Once the tests were happy with named constants, I made sure that the same value showing up in more than one place was because those were different instances of the same thing. I.e. if I needed a CustomerId and an AddressId, I used e.g. 200 for the CustomerId and 300 for the AddressId. (This was just changing the values given to the named constants, because I had already replaced numbers and strings in the body of the code with the named constants.) This meant that if 200 ever showed up somewhere, I knew it was something to do with customer ids.

As well as making things a bit clearer, it made the tests stronger for little effort. If the code had a bug where it should have been using a customer id but instead was using an address id, if the tests used the same value for both (usually the value 1), they would still pass and would fail to spot the bug.

Bigger changes

The constants were now in a good shape, after only local edits to the tests. It was now time to move onto more far-ranging changes, that would be riskier but also have a more obvious improvement.

The tests appeared to be the victim of copy / paste / edit, which is something we’re all tempted by when in a hurry. As Scott Hanselman has said many times, as programmers, when we find ourselves doing something more than once we should strongly consider automating it. Copy / paste / edit is effectively doing something more than once – putting a bit of code into the flow of execution twice. In this case, “automating it” means factoring things out into a new method.

Unfortunately, there’s no simple advice I can give here that would be useful in all circumstances. It’s a matter of looking at the code enough to spot the patterns. For instance, sometimes the differences between things matter, and sometimes they don’t.

It could be that test 1 sets up things A and B in test data, but test 2 sets up A and C. It might be that both tests could cope with A, B and C in the test data. However, the programmer did the minimum for test 1 (A and B), realised that C was necessary for test 2 but B wasn’t, so just did copy / paste / edit to change B into C for test 2. I.e., they did the smallest and quickest operation to produce test 2 by cloning test 1. So in this case a method that creates A, B and C in the test data could remove the repetition from the tests.

It could be that there is difference between tests that does matter. For instance, test 1 needs a deleted customer, but test 2 needs an inactive one, and they both need an active customer, and all the not-active customers need some properties in common because e.g. they will be passed to a method that needs to be given certain arguments even if it ends up not using them. (This could be to test something ignores deleted customers, and then tests it ignores inactive customers, by checking that only the active customer is processed.)

In which case you could factor this out into a method that creates test data, where the parameters to this method have default values. Then, each test method only needs to tell this test data method how the data for this test is non-default. E.g. for test 1, the customer must be deleted, and for test 2 it needs to be inactive. Because the other parameters, e.g. address, credit limit etc. haven’t been overridden, they will stay as their default values.

Unfortunately, to do this kind of thing safely and well means you have to properly understand the tests, what their purpose is, how they exercise the code etc. You could get a certain way doing just a superficial edit, but knowing if an edit will or won’t change the behaviour of the code often needs a deeper understanding.

The result

The test code is likely to get shorter, but this isn’t guaranteed and also isn’t the main purpose. You aren’t optimising for length – you are optimising for meaning. If it helps, you are trying to increase the entropy or information content / density of the tests.

You’re aiming for the next person (or you, in a few weeks’ time) to be able to look at a short-ish test method only briefly and then go “Oh yes, now I get it”. You want them to be able to understand the test’s purpose, and why it needs to exist – what value it adds on top of the tests that have gone before it in the test source code.

These were always there, but were hidden in the details. Some differences between tests were meaningful, but others were accidental and meaningless. You want each line in the test code to make as much difference as possible.

Don’t get carried away

There are two things in tension here, and how you resolve this tension is up to you and your circumstances. If you go too far, then you will reduce redundancy and increase information but at the expense of understandability. For instance, it might be that not all meaningful differences in test data can be comfortably accommodated by a simple test data method. If you try to include all test data, with all its variety, you might need a complicated and hard-to-understand test data method.

It might be better to have a simpler test data method, that doesn’t cope with all circumstances, but is easier to understand. This would mean that the test methods that can’t use the test data method are more complicated, but the other test methods are probably simpler because they use a simpler test data method.

You’re trying to optimise for the understandability of the test set as a whole, which might need there to be local bits, e.g. one or two test methods whose data is a bit weird, that can’t be locally optimised for understandability.

2 thoughts on “Panning for meaning in unit tests”

Pingback: Five Blogs – 20 January 2021 – 5blogs
lewiscowles says:

January 25, 2021 at 6:16 am

When you need deleted customer, inactive customer etc, you may be able to setup named fixtures with such names.

I Really enjoy python for this as you can setup complex chains of annotations which lessen the body of tests and functions, to create extemely readable tests and software.

Of course as you say there should be a limit and an off the top of the head cost-benefit-analysis to this, but the abstract-correctness nerd in me loves opportunities to work in this way.

LikeLiked by 1 person