Recursion and iteration – loop unrolling

This article is part of a series on recursion and iteration:

  1. Introduction to iteration
  2. Introduction to recursion
  3. Mutual recursion and tail recursion
  4. Blurring the lines
  5. Loop unrolling

In this article I will talk about loop unrolling, and the wonder / horror that is Duff’s device.

Loop unrolling

Loop unrolling is something to do only when you’re very pressed for improving your code’s performance, and there are no better options available to you. It’s a way to reduce the time the code spends on housekeeping, while still doing the same more important work.

If you have this for loop

for(int loopCounter=0; loopCounter < 10000; loopCounter++)
{
   doSomethingImportant();
}

and think about the sequence of operations it’s doing when it executes, it’s going to be something like this (assuming some hand-waving operation stop):

  1. loopCounter = 0;
  2. if (loopCounter >= 10000) {stop}
  3. doSomethingImportant()
  4. loopCounter++
  5. if (loopCounter >= 10000) {stop}
  6. doSomethingImportant()
  7. loopCounter++
  8. if (loopCounter >= 10000) {stop}
  9. doSomethingImportant()
  10. loopCounter++

Apart from the one-off cost of the first line, one operation in three is doing something important and the remainder are just doing housekeeping for the loop. If the time cost of doSomethingImportant() is big compared to the other two operations, this might be OK. However, in some situations, the overhead of loop housekeeping might be significant.

Loop unrolling tries to reduce the amount of time doing loop housekeeping, by doing it much less often. This is the smallest version of loop unrolling:

for(int loopCounter = 0; loopCounter < 5000; loopCounter++)
{
   doSomethingImportant();
   doSomethingImportant();
}

Two matching changes have been made:

  1. The loop will iterate half as many times (5,000 rather than 10,000)
  2. Each iteration will do double the amount of work

So doSomethingImportant() will still be called 10,000 times. However, if you look at the operations that are executed, there’s an important change:

  1. loopCounter = 0
  2. if (loopCounter >= 5000) {stop}
  3. doSomethingImportant()
  4. doSomethingImportant()
  5. loopCounter++
  6. if (loopCounter >= 5000) {stop}
  7. doSomethingImportant()
  8. doSomethingImportant()
  9. loopCounter++
  10. if (loopCounter >= 5000) {stop}
  11. doSomethingImportant()
  12. doSomethingImportant()
  13. loopCounter++

If you ignore the one-off cost of initialising loopCounter, now 50% of the operations are doing something important, rather than 33% as before. If we pushed this further, we could e.g. have 10 copies of doSomethingImportant() in the loop, and stop the loop after only 1,000.

This is OK if you know that you will always want to do 10,000 iterations. What if the number of iterations is in a variable? In that case you need two loops – an unrolled loop that will do N lumps of work in work go, and a normal loop that will do 0 to N-1 iterations. If you have N = 3, and the total number of iterations is in the variable totalNumIterations, then it would look like this:

// do as many lumps of 3 as you can
for(int loopCounter = 0; loopCounter < totalNumIterations / 3; loopCounter++)
{
   doSomethingImportant();
   doSomethingImportant();
   doSomethingImportant();
}

// do the remaining iterations
for(int secondLoopCounter = 0; secondLoopCounter < totalNumIterations % 3; secondLoopCounter++)
{
   doSomethingImportant();
}

If totalNumIterations is 14, then the first loop will execute 14 / 3 = 4 times, and each loop will call doSomethingImportant 3 times, giving a total of 12 calls. Then the second loop will execute 14 mod 3 = 2 times, each loop doing doSomethingImportant once. The total is 12 + 2 = 14 calls to doSomethingImportant.

There are at least two problems with loop unrolling:

  1. It makes maintenance harder. If in the future the requirements change, so that doSomethingImportant needs to change to doSomethingElse, the programmer doing the work will search for where doSomethingImportant is being called in a loop. They might easily spot only the first or second loops but miss the other one (because they’re not expecting two loops so close to each other as loop unrolling is relatively rare). If they change only one, that will introduce a bug.
  2. It’s not portable. There are important low-level details that matter here – how big is the instruction cache of the chip that will execute this code? If you go for aggressive loop unrolling, e.g. 100 instances inside the first loop, then you run the risk that the code will actually run more slowly, due to instruction cache misses.

Duff’s device

Duff’s device is a more advanced version of loop unrolling, that aims to address the maintenance problem I mentioned above. I have only come across it in the context of C – I don’t know if it or something like it is possible in other languages. Here it is, based on unrolling the loop to have 8 instances of the operation in one big loop.

int numLoops = totalNumIterations / 8;
switch (totalNumIterations % 8) {
case 0: do { doSomethingImportant();
case 7: doSomethingImportant();
case 6: doSomethingImportant();
case 5: doSomethingImportant();
case 4: doSomethingImportant();
case 3: doSomethingImportant();
case 2: doSomethingImportant();
case 1: doSomethingImportant();
} while (--numLoops > 0)
}

The basic structure is a do / while loop merged with a switch statement, as if they were victims of the same transporter accident in Star Trek. An important part of it is the fact that the switch statement has no break statements on any of the cases, so as soon as execution has jumped to a case statement (based on totalNumIterations % 8) it will execute the doSomethingImportant at that case statement and all later case statements. So, if totalNumIterations % 8 = 6, execution will jump to line 5, make the doSomethingImportant() call there, plus the other 5 calls to doSomethingImportant on the lower case statements.

As the switch statement comes before the start of the do / while, it is executed first. This jumps execution to the relevant case and so 0-7 instances of doSomethingImportant are called. It then hits the close curly bracket on line 11, which is part of the do / while loop. This then triggers the do / while behaviour. The do / while loop effectively ignores the switch statement and its cases, and so each time around the do / while loop will do all 8 instances of doSomethingImportant.

To my mind, if loop unrolling is bad, then this is even worse. If you ever use it, please put a large and hard-to-miss comment saying that this is Duff’s device, so the poor soul reading the code can at least be warned for what they should Google to make sense of it.

Summary

Loop unrolling is a technique for when the performance of a loop is definitely hitting overall performance. (Check that this is true using performance tools, rather than guessing.) Duff’s device is a way of addressing one of the risks of loop unrolling, at the cost of making code that is incredibly hard to understand.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s