Optimisation part 2: Hill climbing and simulated annealing

In the previous article I introduced optimisation.  In this article I will go into two optimisation algorithms – hill-climbing and simulated annealing.  Hill climbing is the simpler one so I’ll start with that, and then show how simulated annealing can help overcome its limitations at least some of the time.

Hill climbing

To explain hill climbing I’m going to reduce the problem we’re trying to solve to its simplest case.  Imagine that you have a single parameter whose value you can vary, and you’re trying to pick the best value.  For instance, how long you should heat some bread for to make the perfect slice of toast, or how much cayenne to add to a chili.

You can then think of all the options as different distances along the x axis of a graph.  How good the outcome is for each option (each option’s score) is the value on the y axis.  As I said before, this is the simplest it can be: problems can easily have many more than one dimension, but it’s hard to visualise an N-dimensional space on a 2D screen.

Imagine that the graph you get for your example looks like this:

A graph showing an upside parabola

Remember from the previous article: we can’t assume that there’s a nice equation that relates the x and y values, so we can’t take short-cuts such as differentiation or factorising.  I just happen to have picked a parabola because it’s easy to produce.

With hill climbing what you do is:

  1. Pick a starting option (this could be at random).
  2. Come up with a candidate next option based on your current option. For instance, change the x value (e.g. length of time toasting the bread) by a random number in the range -10 seconds to +10 seconds.
  3. If the candidate option is better than the current option, adopt the candidate as your new current option.
  4. Go back to 2 to see if you can improve things even further.

In the case of the example above, step 1 will start things at some point on the graph (on the surface of the hill it represents) – you don’t know if you’re on the left-hand side or right-hand side of the hill.

You then randomly pick a next option that’s to the left or right of where you currently are.  If that happens to move you up the hill, you make that move.  If it would instead make you move downhill, you ignore that and try again.

Steps 3 and 4 mean that there’s a ratchet at work – you only ever move if it will improve things.

Sharp-eyed readers will notice that the algorithm as presented above will never stop.  The trouble is, you don’t know if your failure to improve is because you’re at the top of the hill (so all alternatives would make things worse), or because you’re still on the side of the hill and all your random jumps happen to have been downhill.

So you can get the looping to stop after either N times around the loop, or when the last improvement was smaller than X (or some other condition like that).

Note that this approach will work in the opposite direction too.  If the value of each option is something you’re trying to minimise rather than maximise, i.e. the value is the option’s cost to you rather than its benefit to you, then you simply flip things around.  You go to a new option only if its value is less than that of the option you currently have.  This is hill descent rather than hill climbing, but it’s basically the same process.

The things you can change to influence the behaviour of hill climbing (assuming the score of each option is fixed) are:

  1. How the first option is generated;
  2. How option N+1 is generated from option N;
  3. How the algorithm knows to stop.

Problem with hill climbing

This seems like a great approach – you home in on the best option, and can control when you give up.  What’s the problem?  Things are fine until you have a graph that looks more like this:

A graph showing a curve with more than one peak

The key thing is that there’s now more than one maximum.  You could think of it as a big mountain plus foothills around it, and you want to get to the top of the mountain.  In more maths-y terms: you want the global maximum rather than a local maximum.

If you happen to land on the slope of a foothill in your first option, you will climb up to the top of that foothill and get stuck there.  In that case, you would have to go downhill from your first option in order to get off the side of the foothill and onto the side of the mountain across the valley.  That’s something that hill climbing forbids.

In fact it might be much worse that this: imagine your problem is one where the graph has a mountain on the left, with a series of six foothills to its right, and you happen to land on the right-most foothill.  You would need to go downhill many times in order to get off each of the six foothills in turn.

Simulated annealing to the rescue

As Marvin Gaye once sang: When I get that feeling, I want simulated annealing.

Photo of Marvin Gaye
Image credit

The inspiration for this is (actual) annealing.  Annealing is the process that happens when e.g. glass cools.  You can think of the molten glass as being made up of particles that jiggle about – the hotter the glass is, the greater the jiggling.  As the glass cools it is aiming to pack these particles together as tightly as possible.  It could be that two neighbouring particles aren’t going to be a good fit when they come together.  Before they come together, they need to move apart to rearrange themselves in some way (e.g. turn around a bit), so that when they come together again they will fit more closely.

The motion of the particles is basically random, except the maximum size of the moves drops as the glass cools.  Annealing leads to interesting things like Prince Rupert’s drop, and can be used as inspiration for improving hill climbing.

How simulated annealing improves hill climbing

You can think of hill climbing as being a single tactic – improving.  With simulated annealing we will use two – exploring and improving.  The basic idea is that we want to start off exploring the range of options by making quite big jumps (sometimes in the “wrong” direction) and then gradually shift into improving with a tighter and tighter focus on where we currently are.  This should make it more likely that we can jump from a foothill to the mountain (in the exploring phase) and then climb up the mountain (in the improving phase).

We will have a value that we call temperature, which is a positive number that starts high and drops towards zero over time (as in the cooling glass of real annealing).  When we come up with a candidate next option, instead of a simple comparison

if (candidate_score > current_score)

we’ll use this instead:

if acceptance_function(temperature, candidate_score - current_score)

What’s the definition of acceptance_function?  If candidate_score – current_score is positive, i.e. the candidate is better, then always accept it (as in hill climbing).  The important bit is when the candidate is worse.  For that you could do something like:

    double threshold = temperature / (MAX_TEMP * (current_score – candidate_score));
    double randomValue = random.NextDouble();
    bool accept_result = randomValue < threshold;

Threshold and randomValue are in the range 0 to 1.  The bigger temperature is, the more likely we are to accept a backwards step.  The worse the backwards step is, the less likely we are to accept it.

A detail I’m skipping is how the temperature is generated, as it’s not all that interesting.  The more quickly the temperature drops, the more quickly simulated annealing tends towards behaving like hill climbing.

The total list of things we can change to influence the behaviour of simulated annealing is the list from hill climbing plus two additions:

  1. How the first option is generated;
  2. How option N+1 is generated from option N;
  3. How the algorithm knows to stop;
  4. How the temperature is generated for a given iteration;
  5. How the algorithm knows to accept a backwards step (the definition of acceptance_function).

Summary

If you are trying to optimise something where there is a single maximum, hill climbing will help you find it.  If there’s more than one maximum and you want the global one, then simulated annealing is a more robust approach.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s