Solving computer problems with indirection

There’s a pattern that crops up a lot in computing – indirection. It’s sometimes a little bit disguised, but it’s used to solve many kinds of problem. To introduce it I’ll first use an example from outside computing, that I like for many reasons.

Five freedoms for animal welfare

It might be a bit odd to take a detour into animal welfare, but I recently learned about the five freedoms for animal welfare. They summarise how farm animals should be looked after:

Freedom from hunger or thirst by ready access to fresh water and a diet to maintain full health and vigour;

Freedom from discomfort by providing an appropriate environment including shelter and a comfortable resting area;

Freedom from pain, injury or disease by prevention or rapid diagnosis and treatment;

Freedom to express (most) normal behaviour by providing sufficient space, proper facilities and company of the animal’s own kind;

Freedom from fear and distress by ensuring conditions and treatment which avoid mental suffering.

I like this for several reasons. The first is the animal welfare aspect – it is an animal-centric approach, rather than being based on procedures, tools, money etc. The second is that it has a mix of “freedom from” and “freedom to” – the preposition makes a small but important difference, so that all the relevant meanings can be covered. It reminds me of power over/to/with/within – again, the difference in preposition makes an important difference in the meaning so that all the kinds of power can be expressed.

But the main reason why I like it is that it is clear, concise and (in one sense) complete. There is no language that’s deliberately hard to understand, to act as a gatekeeper. The reason why I mention it in this article is that in another sense it’s not complete. It doesn’t you the suitable diet of pigs, how to rapidly diagnose or treat disease in cows etc.

It gives general principles that are universal across different animals, that can be easily understood and agreed upon. They set up a framework into which can be fitted details that are specific for different animals. They have broken down one big problem (how to look after animals) into a series of smaller and more manageable problems (the diet for pigs, the diet for cows, treating pigs, treating cows etc.) They don’t give this detailed information; you would have to look this up elsewhere.

The reason why I include it in this article is that I think it’s an example of the use of indirection. Instead of going straight to an enormous specification of animal husbandry, you go to the principles first, and then from there to the relevant details.

I’ll now give some examples of solving problems in an area with which I’m more familiar – getting computers to do things.

Road signs in English and Welsh saying that the road is closed and to follow a diversion — Photo by Jeremy Segrott, under CC BY 2.0.
Sometimes to get to where we want to be, we need to follow an indirect route.

Methods and functions

Something that programmers probably take for granted in most programming languages is the idea that they can group together some code that does all of a job, and then assign a name to that group of code. This is usually referred to as a method, function, or procedure. You can then use that group of code by putting its name into the flow of some other code, often also passing extra bits of information that the group of code will need (its arguments).

Even though this is taken for granted quite quickly by programmers, it had to be invented by someone. I was brought up to think it was invented by David Wheeler, hence its early name: The Wheeler Jump.

It’s worth thinking about it a little in terms of indirection. You could have all your code written in one very long flow. If you needed to do the same thing more than once, you would have to have the corresponding code many times. Another problem is that it would be really hard to see the wood for the trees. What methods etc. let you do is write shorter chunks of code, that each accomplish only one task and delegate to other chunks to get other tasks done. This has lots of good software engineering names such as separation of concerns and cohesion.

There is a cost to this. The details depend on how much work is done before the code is run e.g. at compile time, and how much is done when it runs, but the jobs to be done are usually the same. Something needs to keep track of the mapping between names and bits of code, often known as a symbol table.

When a method is called, execution of the calling code is temporarily paused, but needs to be resumed when the called code is finished. This is usually accomplished via a stack, which needs to be managed by the runtime system that supports your code. Usually this cost is greatly outweighed by the benefits that indirection gives in this situation – code that is easier to understand, easier to maintain and contains less duplication.

Many cultures

Imagine I’m writing some code to parse some text and extract a number from it. A regular expression is often a good choice for this (please look at my article on regular expressions if they’re new to you). I might write a regular expression like this:

/\d{1,3}(,\d{3})*(\.\d+)?/

(In practice I would probably add a ?: inside each ( and also a ^ and $ at the ends, but they would make it even more complicated.)

This will match numbers that look like this:

1 123 1,234 123,456,789 1.2345 1,234.56

Going from left to right in the regular expression:

\d{1,3} = match 1-3 digits
(,\d{3})* = match 0 or more repeats of the pattern which is a thousands separator followed by 3 digits
(\.\d+)? = optionally match the pattern which is a decimal separator followed by 1 or more digits

This works in the UK, but won’t work in places like France or Germany, because the separators are different in those countries. In France, thousands are separated by a space, and decimals are separated by a comma. So, in France I’d want the regular expression

/\d{1,3}( \d{3})*(,\d+)?/

Note the difference before the \d{3} and the \d+.

One way to have code that can cope with many regions in the world, with the minimum of complication and bloat, is to introduce a layer of indirection in the regular expression. We start off by treating the regular expression as just a text string that’s assembled from parts, some of which are hard-coded e.g. “\d{1,3}” and some come from variables for the bits that vary from region to region. Then we have some way to give values to the variables for France, the UK etc. The variables are the layer of indirection in this case.

Breaking the code up into hard-coded bits plus variables, where the variables have region-specific values, is part of internationalisation i.e. making the code cope with many regions in general. Supplying region-specific values for a given region is part of localisation i.e. making the code work in a specific region. Internationalisation is a one-off activity; localisation needs to be repeated for each new region.

There is much more to internationalisation and localisation than I describe here. It can include text direction, pluralisation rules, currency symbols, address formats, translating into different languages, tax rules etc. In my experience these all involve some form of adding indirection. Turning code that’s specific to one region and internationalising it can be quite a tedious process, as you need to track down all the places where the code makes assumptions that are now no longer valid. (E.g. the thousands separator is always “,”, the text to display on the screen when a certain error occurs is always “Unable to add that person because they use an email address that’s already in use.” etc.)

Data in normalised relational database tables

Indirection can also occur in databases, particularly relational databases that have been at least a little bit normalised. In such a database there might be a table for orders, which could look a bit like this:

OrderId
OrderDateTime
CustomerId
ProductId
Quantity

This contains all the information necessary to process the order, but only some of it is accessible directly – the rest is accessible only after traversing one or more layers of indirection. In order to send the order to the correct place, I need to know who the customer is and where they live. However, all the order says is the CustomerId. This is a foreign key to the Customer table, where I hope to find an entry whose primary key has the same value as the CustomerId on this order.

It might be that the Customer table doesn’t have the address on it (so that customers can easily have more than one address, such as billing address and a shipping address, and so that customers can share addresses e.g. people on the same family plan). In this case it would have an AddressId, and I would have to follow another layer of indirection to get to the Address table.

I would have to do the same thing for getting the details of the product, such as its unit price.

Normalising the data like this results in less duplication of data (the details of the products aren’t copied onto each order, for instance), which makes it easier to change data while keeping things consistent. E.g. if I change my name, this would mean only my entry in the Customer table would change, rather than the copy of my name and other details on every order. It also means that each table has a more tightly-defined purpose.

The cost of this is a more complex data model. Getting all the data you need to do a job might need a more complex query, or, if you’re not careful can be very slow. Not only do you have to write a more complex query, behind the scenes the query optimiser has to work harder to translate your query into an execution plan it can execute.

Indirection as (too) loose coupling

As well as the problems I’ve already described, I have encountered other problems when indirection was, in my opinion, done wrong. As I describe in my article on the cost of flexibility, sometimes the layer of indirection puts too much distance between the two things it’s connecting. Another way of putting it is that the coupling is too loose. Indirection can allow one thing to be chopped up into two connected things (two methods, two database tables etc.) If the indirection is done poorly, the connection becomes tenuous or opaque. It’s hard to see one side from the other, and so understanding and debugging become harder. It’s possible that there are still benefits to this approach, but I encourage you to think about how much information can flow through the layer of indirection, and are there ways this can be improved.

Summing up

Indirection is such a common tool in software, that it’s behind the fundamental theorem of software engineering:

We can solve any problem by introducing an extra level of indirection, except for the problem of too many levels of indirection

There are usually costs as well as benefits to using indirection, so you need to know if the trade-off is worth it for your context. A common benefit is a separation of concerns. In the original example – the five freedoms – these define only the universal fundamental principles, and leave other things to fill in the details. There can be other benefits too, which depend on the context.

UPDATE 20221106: Added the section on too loose coupling

2 thoughts on “Solving computer problems with indirection”

Shishir Pandey says:

November 7, 2022 at 5:39 pm

@Bob,

I am not entirely sure if too loose a coupling that bad a thing, if some concept can be made opaque to another wouldn’t it allow for each concept to evolve independently? Furthermore, is it not natural for a system to evolve to a state of higher entropy implying looser and looser cohesion, shouldn’t that also be the case for software?

If we take a look at all data models that do not denormalize for example in MongoDB or the really wide tables in C* etc, the problem becomes unmanageable as well. IMHO complexity of the domain and the problem being addressed can lead to complexity of data models, I am not very sure if indirection is to be blamed here. If we were to take the same example you’ve taken customer with address in the same table then ability to distinguish between different types of table becomes difficult, if we add many addresses this table becomes really large and is hit way too often.

LikeLike
Bob says:

November 7, 2022 at 5:55 pm

Hi Shishir! The particular case I was referring to (indirectly!) was where some C# wanted to get data out of a SQL Server db table. Instead of using Entity Framework to query the data, it called a stored procedure that did the query and nothing else.

While adding the stored procedure step can open up the possibility of delegating lots of complexity to the data layer, in this case the delegation didn’t gain the code anything, but it lose the nice things you get from EF such as type checking between the C# code and the database. So, in this case, the indirection made things worse in my opinion.

So maybe “too loose a coupling” is the wrong description for it. It was doing the coupling poorly. If we wanted looser coupling that could have been e.g. via REST API, deserialising / serialising JSON using some schema.

LikeLike