Captured variables

This article is about captured variables in C#.  In case you’ve not come across them before, they’re where some code appears to capture and drag with it a variable declared outside the code, such that the code can continue using the variable long after it appears that this should be possible.

They’re not something I use very often, but they’re worth knowing about for two reasons:

  1. They can provide a tool for moving data around, which might be better in some circumstances than the alternatives, such as declaring a new class to hold the code that would otherwise capture variables.
  2. Their behaviour is weird, particularly until you understand the magic that the compiler is doing behind the scenes on your behalf.  This behaviour involves side-effects that you might not expect, leading to hard-to-fix bugs.

I’m going to illustrate them in two stages.  In the first stage I’ll give the simplest example I can think of that illustrates this magic and its consequences.  In the second stage I’ll build on the simple example to reinforce the scope and nature of the magic.

You could think of this article as having the sub-title: Who knows where the data goes?   Yes, that’s a way to shoe-horn in a link to a lovely bit of music.

Code overview

The first example involves a sequence.  Sequences can often be found in databases – they’re things that spit out the next number in a sequence, e.g. 10, 15, 20, 25…  When you create them, you define a starting value and the size of the step between one value and the next – in this case 10 and 5.  Then each time you ask the sequence for a number it will give you the next one.

I won’t create a sequence directly, instead there will be a factory that creates one.  So, to get the series of numbers I will:

  1. Create a sequence factory
  2. Use the sequence factory to create a sequence
  3. Use the sequence to create the numbers

This is a bit contrived, but it helps to make the point of the article.  The factory will be implemented as a class, and the sequence that it returns will be a Func<int> rather than a whole class.  It is this Func<int> that will capture variables from its environment.  The environment will be a combination of the factory class and the method in the factory that creates the sequence.

A key point I want to make is that it’s variables that are captured and not their values.  We aren’t taking a snapshot of the variable at one point in time, instead we are capturing a link to the variable, whose value can change over time (just like any other variable).

Sequence factory

This is the sequence factory.  Note that the step value is passed in to the factory’s constructor and stored in a (class-level) property.  The starting value is passed in to the method that returns the sequence, and stored in a local variable in that method.  Notice that the sequence has no state of its own (it declares no variables) – it relies on variables that are declared outside of itself (which are therefore captured).

Another thing to note is that in the GetSequence() method, after the sequence has been created but before it has been returned, the variables the sequence relies on are both incremented.  This is the part that relies drives home the fact that it’s variables that are captured and not just their values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    public class SequenceFactory
    {
        public int Step { get; set; }

        public SequenceFactory(int step)
        {
            Step = step;
        }

        public Func<int> GetSequence(int startingValue)
        {
            int next = startingValue;

            Console.WriteLine($"In GetSequence, before creating local function, next = {next} and Step = {Step}");

            int result()
            {
                next += Step;
                return next;
            }

            Step += 50;
            next += 2;

            Console.WriteLine($"In GetSequence, after creating local function to capture the variables and incrementing both variables, next = {next} and Step = {Step}");

            return result;
        }
    }

Using the factory and sequence

This is the code that creates the factory, uses the factory to create the sequence, and then uses the sequence to create two numbers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
	var factory = new SequenceFactory(1);

	var sequence = factory.GetSequence(3);

	int firstValue = sequence();

	Console.WriteLine($"First value from sequence = {firstValue}");

	int secondValue = sequence();

	Console.WriteLine($"Second value from sequence = {secondValue}");

The output this produces is:

  1. In GetSequence, before creating local function, next = 3 and Step = 1
  2. In GetSequence, after creating local function to capture the variables and incrementing both variables, next = 5 and Step = 51
  3. First value from sequence = 56
  4. Second value from sequence = 107

The first line shows the variables when their values are as you would expect, given the values passed to the factory.  The second line shows how the GetSequence() method has messed about with the variables after the sequence has been created.  If the sequence captured values rather than variables, this would have no effect on the behaviour of the sequence as it happens after the sequence was created.

However, the third line shows that this isn’t true.  If values were captured, the first number the sequence returned would be 4 (3 + 1).  Instead it’s 56 (5 + 51).  The fourth line continues in this vein.  Instead of the second number being 1 bigger than the first, it’s 51 bigger.

Further proof of weirdness – two sequences from one factory

We’ll build on the example above now.  The factory and sequence will be the same, but we will use them in a slightly more complicated way.  We will use the factory to create two sequences, and then interleave getting values from the sequences.  This will show that not only are the sequences capturing variables from their environment, but because their environments overlap, their captured variables overlap, resulting in a weird interaction between the sequences.

Here is the more complicated code that creates and uses sequences:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
    var factory = new SequenceFactory(1);

	
    var sequence1 = factory.GetSequence(3);

    int firstValueFromSequence1 = sequence1();

    Console.WriteLine($"First value from sequence 1 = {firstValueFromSequence1}");

    int secondValueFromSequence1 = sequence1();
	
    Console.WriteLine($"Second value from sequence 1 = {secondValueFromSequence1}");
	

    var sequence2 = factory.GetSequence(20);

    int firstValueFromSequence2 = sequence2();

    Console.WriteLine($"First value from sequence 2 = {firstValueFromSequence2}");

    int thirdValueFromSequence1 = sequence1();

    Console.WriteLine($"Third value from sequence 1 = {thirdValueFromSequence1}");

It produces this output:

  1. In GetSequence, before creating local function, next = 3 and Step = 1
  2. In GetSequence, after creating local function to capture the variables and incrementing both variables, next = 5 and Step = 51
  3. First value from sequence 1 = 56
  4. Second value from sequence 1 = 107
  5. In GetSequence, before creating local function, next = 20 and Step = 51
  6. In GetSequence, after creating local function to capture the variables and incrementing both variables, next = 22 and Step = 101
  7. First value from sequence 2 = 123
  8. Third value from sequence 1 = 208

Lines 1-4 are as in the previous example.  Lines 5-7 are similar, but for the second sequence, and getting only one value from it rather than two.  Note that it uses a different value for the starting / next value (20 rather than 3).  However, the step value is defined at the factory level and so shared by the two sequences.  In line 6, when the step value is incremented as part of creating the second sequence, this change in the step value affects the first sequence as well.  You can tell this because the next value produced by the first sequence after the second sequence was created (the third value from first sequence) is 101 bigger and not 51 bigger than the previous value from the first sequence.

So, it looks like the two sequences are:

  1. Each capturing their own next variable
  2. Sharing one captured Step variable

What’s going on?

Behind the scenes, the compiler creates an anonymous class for a captured variable, creates an instance of this class i.e. an object, and creates a reference to the object wherever the captured variable is used.  This means the object will survive garbage collection until the last use of the captured variable goes out of scope.

Because the captured variable next is a method’s local variable, the two invocations of the method (to get the two sequences) will each get their own object, meaning the two sequences will capture a different instance of the next variable, leading to them having independent values.

This contrasts to the captured variable Step, which is a class-level property.  This is captured once (at the object, i.e. factory, level), and because the two invocations of GetSequence() use the same factory they refer to the same instance of the anonymous class holding the captured variable.  This means that when Step’s value is changed due to one sequence, this change also affects the other sequence.

One thought on “Captured variables

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s