Computer science while doing the laundry 2: Bin sort

This is part of a short series of articles about computer science while doing the laundry:

Merge sort
Bin sort

In the previous article I used doing a lot of laundry to illustrate merge sort, which is probably an impractical way of doing the laundry. In this article I will suggest a way that might actually help with one of my least favourite bits of laundry, which is pairing up socks. It’s based on the fact that sorting takes non-linear time (which I will explain below) and relies on doing bin sorting quickly (which I will also explain).

The basic problem

I don’t think I need to explain socks or washing them. One solution to the sock pairing problem is to not care what gets paired with what. Another solution is to not have to care, because all your socks are identical. I’m going to assume that you do care at least a bit and that you have to care (because you have at least two kinds of sock).

This is going to get a little bit mathematical (but hopefully not too much), and will use the idea of time complexity. If that’s new to you, then I suggest you look at my article P = NP?, where I introduce it. In summary, it’s how the time it takes to solve a problem (like pairing socks) changes as the size of the problem (e.g. how many socks you have) changes. You can also worry about things like space complexity – how the space, such as memory, taken to solve a problem varies as the problem’s size varies, but for this article I’ll ignore everything apart from time complexity.

I’m also going to assume that pairing up socks is essentially a sorting problem. There are caveats to this, which I go into at the end, but for the bulk of the article I will assume that we need to sort the socks to get each sock to be next to its partner. If you can think of an algorithm for it that doesn’t involve sorting then I would like to hear about it.

Time trends in sorting

Sorting is a field where there’s a surprising amount of detail, almost all of which I’m skipping over. In general, the best case for how long it takes to sort a list of things with x items in is x * log(x). (For reasons I’ll explain below, it doesn’t actually matter what base the log is to, but for the sake of simplicity I’ll use base 10.) If you plot this on a graph, it looks like this:

graph showing y=x and y=x*log(x), for x >= 0. Both generally go up to the right, but y=x*log(x) climbs away further and further from y=x as x increases

The blue line shows y = x * log(x), and the red line is y = x for comparison. The comparison with y = x is important. If the graph for sorting were y = x, if you doubled the size of the list being sorted, you would double the amount of time taken. If the list got 10 times as big, it would take 10 times as long etc. This behaviour, often referred to as linear, is the best you can reasonably expect for many programming problems (unless you’re doing something wacky like using a quantum computer).

Unfortunately, sorting is worse than linear – its graph is higher up the page than y=x for most of the time. That means if you make the list twice as big, the time taken will be more than twice as big. It’s this difference (between the two graphs) that we will exploit when pairing socks.

Splitting the list before sorting

Just as making the list bigger will have a disproportionally bad effect on the time taken, making the list smaller will have a disproportionally good effect on it. That might not seem all that helpful – if you took only a fifth of the list and sorted it, it might be quick but you’d still have most (four fifths) of the list to do. However, if you managed to split the list up into unsorted fifths relatively quickly, and didn’t have to merge the sorted bits back again (as in merge sort), then the total time for sorting everything from the original list will be reduced.

This graph tries to illustrate that.

A graph of y=x, y=x*log(x) and y=0.2x*log(0.2x) for x>= 0. All generally increase as x increases. y=x*log(x) is greatest, then y=x, then y=0.2x*log*(x), at least in the range x=0-150

The red line shows y = x.log(x). The vertical line at x=100 is to help you see that it takes 200 units of time (e.g. milliseconds) to sort. The green line shows how long it will take to sort a list one fifth of the size of the list shown by the red line. So when x=100, the green line shows how long it takes to sort a list with 20 items, which is roughly 26. If it took 0 time to split the list of 100 into 5 lists of 20 each, and you didn’t have to merge the sorted 20s back together, the total time taken to sort 100 things via 5 lists of 20 is 5 * 26 = 130. This is only 65% of the time it takes to sort the same number of things in one big list.

As you can see, as x gets bigger, the vertical distance between the lines for the split and unsplit lists gets bigger, so the saving is bigger. However, to get this benefit you need a way to split the unsorted list into fifths that takes less time than the time you save by sorting smaller lists. Also, you need to be able to leave the sorted lists separate rather than merging them.

For sock pairing, this is probably true. Exactly how you do it will depend on the context, but you could separate socks into these groups:

stripey
logo
any other non-black socks
plain black
black but not plain

You can work through the pile of socks fairly quickly, throwing a sock into one of five piles after a quick look at it and minimal thought. Ideally, you want rules that result in five groups that are the same size. This is the way you minimise the size of the biggest group – the worse-than-linear nature of sorting the groups means that the biggest group will take a disproportionate slice of the total time. At the end, you can just gather up the sorted piles quickly, rather than having to interleave socks from different groups carefully and slowly.

It’s important that you pick rules that you can check against quickly, otherwise the time saving due to sorting smaller groups will be lost. So, it probably wouldn’t be a good idea to group based on size ranges, where you measure the size with a micrometer, or based on the results of a focus group, or using a microscope to measure thread count.

This rough sorting into groups can be known as bin sorting – you aren’t completely sorting the socks, but assigning them to groups or bins. The important aspect is the speed, and (in this case) getting roughly equal groups.

The effect of different numbers of groups

So far I’ve been assuming that there would be five groups, but that is arbitrary. What would happen if you used fewer or more groups? This graph shows the result of splitting the total list into 2, 4, 5 or 8 groups.

A graph showing y=x, y=x*log(x), y=2(0.5x*log(0.5x)), y=4(0.25x*log(0.25x)), y=5(0.2x*log(0.2x)), y=8(0.125x*log(0.125x)). All go up to the right, but y=x*log(x) is highest, then y=2(0.5x*log(0.5x) etc. down to y=x lowest — **Key**:
Red = total cost for 1 group
Blue = total cost for 2 groups
Green = total cost for 4 groups
Purple = total cost for 5 groups
Black = total cost for 8 groups
Yellow = y=x for comparison

As you can see, the more groups there are, the better the result. This isn’t a surprise, as it’s just exploiting the same worse-than-linear behaviour of sorting. However, splitting the list into more groups has at least two potential problems:

The rules get harder to evaluate, and so the splitting takes longer;
The chances of equal-sized groups goes down, and a group much bigger than the others will block the benefit we’re after.

Another way of looking at the benefit from more groups is in this graph. It shows the difference between the time taken to sort the list in one go and the total time taken to sort the same things but via 2, 4, 5 or 8 groups (assuming it takes 0 time to split the groups or join them again).

Graph showing y=x*log(x)-2(0.5x*log(0.5x)), y=x*log(x)-4(0.25x*log(0.25x)), y=x*log(x)-5(0.2x*log(0.2x)) and y=x*log(x)-8(0.125x*log(0.125x)). All are straight lines going up to the right. The lines are in order of the fractional log term - the line involving log(0.5x) is highest and the line involving log(0.125x) is lowest. — **Key:**
Black = total cost for 1 group – total cost for 2 groups
Green = total cost for 1 group – total cost for 4 groups
Blue = total cost for 1 group – total cost for 5 groups
Red = total cost for 1 group – total cost for 8 groups

The difference (the time saving) goes up as either the total number of things being sorted goes up, or the number of groups goes up (or both).

Caveat 1 – the time taken to sort things

Earlier I said that the time taken to sort x things is x * log(x). This isn’t strictly true. Normally the time taken by a program is expressed in Big O notation. Big O notation is

Time = O(f(x))

f(x) is some function of x, such as f(x) = x * log(x), and it’s normally read as “time is of the order of f(x)”.

What Time = O(f(x)) means is:

Time <= a*f(x) + b, for all x > c

where a, b and c are constants

In other words:

f(x) is an upper bound to the time – it might take less
Actually it’s not necessarily f(x) that describes the upper bound, but f(x) modified by two constants – one to multiply it and one added on at the end
The upper bound might not hold until you get to a certain minimum size of x (c).

I’ve been blithely plotting x * log(x) on a graph as if the performance will be e.g. exactly 26 units of time for 20 items. This is not necessarily true, but I’ve been doing it to help illustrate a general point that will be true. The time taken to sort a list will be proportional to x * log(x), at least most of the time.

The constant a is why it doesn’t matter which base you use for the logarithm. To convert between bases a and b you can use

log_b(x) = log_a(x) / log_a(b)

This means the log of a number in one base is equal to some constant (1/log_a(b)) times the log in any other base, and the value of that constant is fixed by the two bases you’re converting between. E.g. to convert between base 10 and base 2:

log₂(x) = log₁₀(x) / log₁₀(2)

log₂(x) = log₁₀(x) / 0.3

Caveat 2 – pairing is probably easier than sorting

Another assumption I’ve been making is that to pair up socks we need to sort them. This is probably not true, but I think it’s true enough that it’s still helpful. The reason why it’s not true is that we probably don’t need socks to be fully sorted, just paired. If we had 3 pairs of socks (A, B, C) which each had sock 1 and 2 (for left and right), then sorting would put the socks into exactly this arrangement and no other: A1 A2 B1 B2 C1 C2.

We probably don’t care that the left sock is always before the right, or that pair A is before pairs B and C. The 6 socks could be arranged randomly into 6! = 720 orders, and sorting puts them into exactly one of those orders. If we accept pairs in any order, and socks in either order within a pair, then we want any of 48 orders. (There are 3! = 6 ways of arranging the pairs, and the socks in each pair can be switched left/right or not independent of the socks in the other pairs, giving 2^3 = 8 ways of arranging the socks for a given arrangement of pairs, giving 6 * 8 = 48 possible orders.)

I don’t know an algorithm for pairing socks that isn’t sorting them – if you do please let me know. However, the important thing is the worse-than-linear time. It’s my belief that even the best way of pairing socks (that isn’t sorting them) would still take worse than linear time, and so the same general principle will apply. Pairing the same total amount of socks will take less time if you work on N piles of socks that each have 1/N of the total socks, and if you can generate those N piles quickly enough, the total time will be reduced by splitting first.

Summary

This is probably more maths than you were expecting for pairing socks. A friend of mine completely sidesteps this problem by having only plain black socks. If you do have a mountain of socks to pair up, I suggest that you try coming up with a quick way of splitting them up into 5 or so groups and see if the extra hassle is worth it.