This article is my attempt to put my money where my mouth is. A friend mentioned the BBC Radio 4 series In Our Time on Facebook, and in the conversation that followed I said that the only time In Our Time discussed something close to my home turf I got grumpy about how poorly I thought things were explained. So, this is my attempt at explaining things, in terms that I hope are suitable for a Radio 4 audience. Note, I didn’t say something like “in a way a grandmother could understand” because that’s sexist and ageist nonsense that gets exposed for what it is if the grandmother in question is someone like Grace Hopper or Margaret Hamilton.
The question I’ll be tackling is: P = NP? It’s a question at the theoretical end of computing that possibly doesn’t make sense even as a question, let alone its answer. However, one answer might mean that modern computer security, which underpins things like online shopping and banking, is at risk. Also, there is a $1 million prize to anyone who can answer it.
This will take quite a run-up, so I hope you’ll bear with me when I appear to be going into completely off-topic things to start with. Please trust me, and I will work my way towards an answer. To be fair to the In Our Time presenters, there will be no pictures or diagrams, because they couldn’t use any on the radio.
Finding the biggest number in a list
Where I’ll start is a list of numbers. We want to know what is the biggest number in that list. If it’s short enough, a human could probably just look at the list in one go and say what the biggest number is. However, we can’t assume we’ll be that lucky – either it’s a very long list or we’re getting a computer to do this for us.
This is something that most people can do:
- Start off by saying that the biggest number is the first number;
- Work your way along the list, and if the number you’re currently on is bigger than the biggest you’ve seen so far, the current number becomes the biggest number.
- At the end of the list you know what the biggest number is.
It’s probably a bit tedious spelling things out like that, but unfortunately that’s the level of pedantry and tedium that’s behind much work with computers.
We must visit every number in the list, just in case it’s the biggest one. We only need to visit each number once. These two together mean that if the list got three times as big, we would do three times as much work. If the list got ten times as big, we would do ten times as much work, and so on.
So, if the list had n things in it, the work would be proportional to n units big. I’m deliberately being vague about what big means here and so what its units are – time, memory etc. It doesn’t really matter as long as we’re consistent in what we say it means. So, I’m just going to forget about the units that work is measured in from now on. The important thing is that the work is proportional to n.
Sorting a list
We’ll now move onto a different problem that, together with the previous problem, will help us to introduce the main concept behind the question.
We have a list of numbers again, but instead of wanting to find the biggest we want to sort the list. (It doesn’t really matter if they end up smallest to biggest or biggest to smallest – the important thing is that they’re sorted.) This problem is harder than the previous one, but it might be hard to express how or why it’s harder.
One way to solve this problem is to take an approach called Bubble Sort. It’s not the best, but it’s the way that people often start when they’re learning to program and it’s a useful introduction to this problem. Just because we have to pick an order, I’ll decree that we want the list to be sorted smallest to biggest. Also, imagine that the numbers are in one long line or row, rather than one tall column.
You start with your left hand pointing at the first number in the list, and your right hand pointing at the second number in the list. If the list were sorted, we’d expect the left-hand number to be smaller than the right-hand number (or maybe the same as each other if the list had duplicates – the important thing is that the left-hand number won’t be bigger than the right-hand number).
If the left-hand number is bigger than the right hand-number, you swap them over. That’s all we’ll do for this pair of positions in the list for now. We then slide ourselves along the list by one position, so that our left hand points at the second number in the list and our right hand points at the third number in the list. The left-hand number might have only just moved into the second position in the list (if we swapped things over before) or it might have been there all along. We don’t care – we just go with whatever number is in the second position now.
We repeat the comparison as before – if the left-hand number is bigger than the right-hand number we swap them over. We then repeat this process (slide ourselves along one position, compare, swap if necessary) until we’ve compared and possibly swapped the last two numbers in the list.
There are two things we know when we’re at the end of going through the list:
- If there were n things in the list, we have done n-1 comparisons;
- The biggest number is definitely at the end of the list.
Because we need two items for a comparison, we can’t slide along such that our left hand is pointing to the last item in the list, as there would be nothing for our right hand to point to. So, we have to stop one short – with our left hand pointing at the n-1th item. Given that we started with it pointing at item 1, and we worked our way through all the items in between, it has pointed to n-1 different items.
The second one is a bit less obvious. The worst case is where the biggest number starts at position 1. After the first comparison, it’s swapped to be in position 2 because it’s bigger than the number that started off there. Similarly, when we shift along to comparing the numbers in positions 2 and 3, it will be swapped again. This continues all the way along the list – the biggest number is swept along as we move down the list, and so ends up in the last position.
So, we’ve done n-1 comparisons, and have 1 item in its correct place. We don’t know anything about the rest of the numbers in the list, other than none of them is bigger than the one at the end. As far as the rest of the work’s concerned, we can ignore the last position and the number in it, because they’re already sorted. This means that what’s left is a problem just like the one when we started with, except now the list we need to sort has n-1 numbers (in the first n-1 positions) rather than n.
If we started at the first position again and repeated the process over the first n-1 positions, we would do n-2 comparisons, and the biggest number from the first n-1 positions would be in its correct place (in position n-1, i.e. the end of the slightly shorter list we’ve just worked on).
So, each time through, we get one more item in its correct place, and we do one fewer comparison than the length of the list we look at.
If you carry this to its logical conclusion, we end up doing: (n-1) + (n-2) + … 3 + 2 + 1 comparisons to sort the whole list. The mathematician Gauss showed that the answer to this kind of sum is ½ * (n-1) * (n-2), or ½n2 – 1.5n + 1. So we can say that the difficulty of sorting a list of n numbers is (very roughly) proportional to n2.
It turns out that there are better ways to sort lists, such as Quick Sort, but the best still takes n * log(n) steps. Both ½n2 – 1.5n + 1 and n * log(n) are bigger than n (except when n=1), and grow even bigger as n gets bigger. This is the more mathematical way of expressing that sorting is a harder problem than finding the biggest number.
The important concept
Now that we have two different problems to compare, we can move to the important concept – complexity. In this context, complexity means how the difficulty of a problem varies with its size. In the case of finding the biggest number in a list, if the list were three times bigger, the solution would take three times as long to find. In the case of sorting a list using Bubble Sort, if the list were three times bigger, the solution would take (roughly) nine times as long to find.
There are many possible ways in which size and difficulty could be related, depending on the problem. The two examples we’ve seen so far can be described as polynomial. That is, the difficulty is a polynomial function of size. A polynomial is something you might remember from school maths – it’s things like 2x4 + 18x3 – 4x2 + 12x – 5 or 8x + 1.
I have so far skated over a key detail, but we can tackle it now. In the first example, the thing that you would get n of was comparing a number against another number (the biggest so far) and then updating the biggest so far if necessary. In the second example, the thing that you would get roughly n2 of is different: compare two numbers and swap them over if necessary.
When discussing complexity, this difference doesn’t matter. Complexity assumes that to solve a given kind of problem there will be a bit of constant size that you do no matter what (e.g. start by saying the biggest number so far is the first number in the list), and then another bit that you do more or less of depending on how big the problem is. The details of the thing you do more or less of don’t matter, just how the number of times you do this varies with the size of the problem.
Half-way to understanding the question
We have finally got the definition of half of the original question! The P in the question stands for polynomial. It actually means: the set of all problems where the best solution we’ve come up with means the difficulty is a polynomial function of size.
What’s the other half of the question?
NP stands for non-deterministic polynomial, which doesn’t make sense yet so I’ll try to explain it.
Imagine you had to build something that would solve a problem like finding the biggest number in a list of numbers. It could be a computer program, or maybe a machine made of Lego. At each step of the process, your program or machine would need to know what to do next.
In the examples I’ve given above – finding a maximum value in a list or sorting a list – what to do next is completely predictable in all situations. There might be a question to answer, e.g. is the left-hand number bigger than the right-hand number?, but the answer to that question is something you can always work out from the context. This kind of machine can therefore be described as deterministic. Its behaviour can be entirely predicted from how the world was when it started.
There are problems (I’ll give some examples shortly) where our best solution needs a machine built on magic. The machine is faced with a question it can’t answer given the information available. There is a set of options it needs to choose between, and it doesn’t know how to choose.
So, either there is magic such that it somehow knows which one to pick, or there’s magic such that it can clone itself loads of times and each option gets its own clone. Therefore each clone lives in a world where the decision didn’t really exist – the clone was always going to go down a particular path using its option, just as the other clones were always going to go down different paths with different options.
Such a machine is called a non-deterministic machine. We’re now in a position to define NP. NP is: the set of problems where the best solution we’ve come up with means the difficulty is a polynomial function of size, running on a non-deterministic machine.
[Since writing this article, I’ve written another one on Finite State Machines, that includes a bit more on the difference between deterministic and non-deterministic machines.]
Strictly speaking, P should be re-stated to make the difference between the two parts of the question clear.
- P: the set of all problems where the best solution we’ve come up with means the difficulty is a polynomial function of size, running on a deterministic machine;
- NP: the set of problems where the best solution we’ve come up with means the difficulty is a polynomial function of size, running on a non-deterministic machine.
So, finally, we can look at the question. Are these two sets the same? The important part is “the best solution we’ve come up with”. There are problems where at one time the best solution was an NP one, but later an alternative solution was discovered that was P. Once a problem has a P solution, it will stay in P forever. Problems that are currently NP might in the future move across to P, but we don’t know.
Instead of whittling away at individual problems that are currently NP, can we prove that in time all NP problems will turn into P problems? Or can we prove that some problems will always remain as NP problems?
Some example NP problems
One example NP problem is called the Travelling Salesperson Problem. It’s the problem you face when you have a list of errands to run, and you want to do them as quickly as possible. More formally: there is a set of places you need to visit, you want to visit each place exactly once, you want your journey to be as short as possible, and you’re constrained by the roads etc. that connect those places.
As the number of places (which, for this problem, is the value of n) grows, the number of possible routes grows extremely quickly. There is currently no P solution to it.
Another example NP problem is the one that touches on computer security as I mentioned in the introduction. If you have a number (actually, a whole number bigger than 0), can it be expressed as prime numbers multiplied together? More formally: what are the prime factors of the number? For example, 51 = 3 x 17, and 3 and 17 are prime numbers, so 3 and 17 are the prime factors of 51. 52 = 26 x 2, but 26 isn’t a prime number, so you end up with 52 = 13 x 2 x 2, therefore 2 (twice) and 13 are the prime factors of 52.
It’s trivial to check an answer – you simply multiply the factors together to see if you get the expected number, and you can look up the factors in a list of known prime numbers (or work out from scratch if they’re prime, which is a P problem). However, going from an arbitrary number to its prime factors is hard. In fact, it’s an NP problem.
The reason why this bit of maths affects computer security is that the difficulty of coming up with prime factors is the basis of much modern computer security. The problem with security is that there is a cost, and this cost needs to be balanced against the benefit. In World War 2, the Enigma machine needed a very large set of code books to be distributed to all the Enigma machine operators in many countries, all of which needed to be kept secret, and all operators had to follow the rules about how to use the code books (e.g. move to the next page at midnight).
To talk securely online to my bank, I don’t have a code book issued by the bank. Instead, there’s a web of computers that trust each other, and this trust is based on the fact that it’s currently too hard to work out the prime factors of big numbers. If I could work out these factors, I could for instance read messages that you and your bank send to each other, or I could impersonate your bank without your knowing.
NP problems with super-powers
Not all NP problems are equal. Within the set of NP problems is a smaller set of problems called NP complete. What’s special about an NP complete problem is that all other NP problems can be re-phrased in terms of it. That means solving any NP complete problem will solve all NP problems (not just all NP complete problems). The Travelling Salesperson Problem is an NP complete problem.
If P = NP, then we know that all current NP problems have a P solution out there somewhere. Given that NP solutions need magic (and the lack of magic takes a lot of effort to work around), knowing that there’s a non-magical solution out there will mean it’s worth continuing to look. P solutions might still take a long time to run, but in general are quicker than NP ones.
If P != NP, then we know at least some NP problems have no P solution, but we might not know which. It depends on the details of the proof that P != NP. If the proof just shows one counter-example – one NP problem that isn’t P – then that’s the least information. If the proof shows that one kind of NP problem, or all remaining NP problems aren’t actually P problems, then that gives us much more information.
If P = NP, and someone finds a solution to an NP complete problem, then suddenly an awful lot of problems have attractive (P) solutions. This is where computer security would look shaky, because it would be easier to find prime factors of big numbers.
If you’ve read this far – well done. These are fairly dry and abstract problems from the maths end of computing. However, they enable increasing amounts of everyday life, given how much computers have been integrated into so much of what we do.
To finally address the question P = NP? The answer is: we don’t know.