Ever since I realized coroutines are now a part of C++, I’ve wanted to understand them and use them. While there are different types of coroutines, the most prolific one is what is called the generator, which I’ve used and abused quite a bit by now.

What’s a generator?

Simply (and technically incorrectly), a generator is a stateful range generator, much like std::views::iota, but one is defined very similarily to a single function that pushes values into an output vector (or list, deque, etc.), thus making it very easy to write and read no matter how complex the logic is. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/** Compute count numbers using a vector. */
auto computeVector(int count)
    -> std::vector<int>
{
    auto result = std::vector<int>{}; // Create result object
    for (const int value : std::views::iota(0, count)) {
        result.push_back(value); // Push values
    }
    return result; // Return all values
}

/** Compute count numbers using a generator. */
auto computeGenerator(int count)
    -> std::generator<int>
{
    // "result" object created by compiler
    for (const int value : std::views::iota(0, count)) {
        co_yield value; // Push values
    }
    // Return to mark the end of the range
}

Both functions look very similar and they have very similar interfaces: integer in a range of integers out. However they have significantly different implementations and implications. For now the important parts are that the output is a range of values (much like a std::vector) and instead of calling a push_back() or other function, we use co_yield keyword. This act of “pushing values” using the co_yield keyword is called yielding a value.

How exactly the class behaves and how it is implemented is outside the scope of this blog.

Type erasure

On several occasions, I’ve been faced with the question of what types do I put in the parameter list of a function. I want to have as broad of an input space as possible and when taking in ranges this boils down to a question of publishing internals for a template or using type erasure. But how does one erase the type of a range while still having a useful range? Standard library classes like std::span are great when they’re applicable, but if all I want is an input range (so iterable from begin to end once) then a span (contiguous range) is too restrictive.

One option is to define an implementing function that takes in some kind of a “next element, please” function as input:

1
2
3
4
5
6
7
8
9
10
void myDoStuffImpl1(const std::function<Type()>& nextElementPlease)
{
    for (
        auto value = nextElementPlease(); 
        /* ??? */; 
        value = nextElementPlease()
        ) {
        /*compute*/
    }
}

But what goes in the ??? part? I could wrap each value in an optional or use a pointer instead. Neither of these options is great, the optional part adds more overhead and wrapping, while a pointer may run into lifetime issues, as anything created in the function is probably not alive by the time we get to it. Moreover the interface is quite tedious to write as we need to keep track of the current iterator and the end and stuff all this into function object.

If we take in a generator instead, we can iterate over it with a normal range-for loop and our template wrapper also becomes a bit simpler:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// defined in its own translation unit (source file)
void myDoStuffImpl2(std::generator<Type> generator)
{
    for (auto&& value : generator) {
        /*compute*/
    }
}

// defined in a header
template <std::ranges::range Range>
void myDoStuff(Range&& range)
{
    myDoStuffImpl2([&]() -> std::generator<Type> {
        co_yield std::ranges::elements_of(std::forward<Range>(range));
    }()); // <- note calling the lambda
}

Yes, the interface part is quite ugly, as is tradition for generic template programming in C++. However it eats any range we might want to throw at it and quite happily at that, which means callers need not fight with my interface as I’ll just accept whatever they have. The number of times I’ve had the needed data in a std::set or something and the function I need to call only accepts std::vector is too high. This of course isn’t free, but the performance impact is probably not significant. At least I’ve not run into problems where the use of a generator is the bottleneck.

Fun fact: The implementation of yielding a std::ranges::elements_of of a range in std::generator (probably) uses this mechanism of erasing type by wrapping it in a std::generator.

Flattened call stack

Writing some part of a bigger application is a challenge on its own right. Often there is a need to process a range of objects in two or more steps. Often with user-facing applications the user wants to be able to cancel the processing of this range of objects. Implementing this in a simple way in code tends to pollute my beautiful free functions with integrations to the rest of the system. It would be great if we could have interruptable processing, but not necessarily always on a separate thread (to which the interruption would be to terminate said thread).

As generators are coroutines, they co-operate with the caller (or other coroutines) on who gets to run and when. In my generator, once I’m done creating a value I can pass the control to the caller in full. That is, back on to the caller’s stack frame. That is, the caller is back in control and there is no extra nesting. The traditional method of passing control back to the caller is via a callback (optionally with client data, though not always). Even with type erasure using std::function I still need to know what to accept. Probably something like std::function<bool()> called keepGoing. And now I have to remember to call it in places.

But what if I give back control for every value I produce? Then I don’t need to care about what the cancellation mechanism is, or even if there is any in the first place. Though the process can no longer be cancelled during the creation of a value, only once a value is produced, but in my defence there would be a bug in there preventing this anyway, if it even ever were possible. When writing a generator I only need to care that no lock or anything not guarded by RAII is kept alive over a suspension point (a co_yield statement) and any cancellation mechanism (e.g. early return, exception) will work just fine to cancel my generator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/** Makes a bunch of values that are passed back as they are generated. */
auto makeValuesGenerator()
    -> std::generator<std::string>;

/** Some stateful computation.
 *  Returns false when no more input is accepted. */
auto compute(std::string accumulator, const std::string& value)
    -> bool;

auto processOnlyNecessary()
    -> std::string
{
    auto result = std::string{};
    for (const auto& value : makeValuesGenerator()) {
        if (not compute(result, value)) {
            break; // The generator is destroyed and its 
                   // stack's destructors are called.
        }
    }
    return result;
}

Bonus points if I was already going to use a generator to reduce memory overhead from O(n) with a vector to O(1), I don’t need to do any changes to my pure and self-contained generator.

Decoupling

The situation that prompted me to write this post is when I was working with a pipeline with two parts: create values from a file and consume them (output to another file). This may sound very simple, but I’ll assure you the logic for both parts is very much nontrivial. It makes sense to split the pipeline strongly in two. A great benefit of this is that I can now run each part in its own thread, for up to 50 % shorter runtime. However I also suspect there may sometimes be bugs that only happen when running my code in multiple threads, so I also need to run them in a single thread.

If I were to use a message queue or something to support the multithreaded setting, I’d have to put a serialization point between the parts and collect the values in its own thread for single-threaded runs, since neither part would support co-operative execution. But with generators they do! And I don’t need to decide on what kind of message queue to use to connect them, as the first part can be a generator and the second part take a generator as parameter:

1
2
3
4
5
6
7
auto createValues(std::filesystem::path& fileIn)
    -> std::generator<Value>;

void consumeValues(
    const std::filesystem::path& fileOut,
    std::generator<Value> values
    );

Now I can use any method I want to glue these parts together and choose at runtime, with the same creating and consuming code running in both cases. Additionally it’s much easier to test now that I don’t have to mock reading the message queue for serialization. If we assume a magical message queue class template called Channel that allows pushing values to the queue and reading in a range-for loop we can get something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
auto createValues(std::filesystem::path& fileIn)
    -> std::generator<Value>;

void consumeValues(
    const std::filesystem::path& fileOut,
    std::generator<Value> values
    );

void runSingleThreaded(
    const std::filesystem::path& fileIn,
    const std::filesystem::path& fileOut
    )
{
    consumeValues(fileOut, createValues(fileIn));
}

void runMultithreaded(
    const std::filesystem::path& fileIn,
    const std::filesystem::path& fileOut
    )
{
    auto createChannel = Channel<Value>{};
    auto ready = std::latch{2};

    auto createThread = std::jthread{[&] {
        for (const auto& input : createValues(fileIn)) {
            createChannel.push(input);
        }
        createChannel.close();
        ready.count_down();
    }};

    auto convertThread = std::jthread{[&] {
        consumeValues(fileOut, [&]() -> std::generator<Value> {
                    co_yield std::ranges::elements_of(readChannel);
                }()
        );
        ready.count_down();
    }};

    /* Do some multithread monitoring or someting 
     * while we wait for the two threads to run. */
    ready.wait();
}

The beautiful thing here is that neither createValues nor consumeValues know about threading or channels, all they care about is creating and consuming values, respectively. Of course, a generator isn’t a silver bullet and just relying on a generator for input results in some ambiguity when that range ends abruptly.

Concluding remarks

While I’ve yet to find good uses for any other type of coroutine (e.g. task), generators have been wonderful to work with, to the annoyance of my coworkers. While this blog focuses on C++ and more specifically C++20 and later, because that’s what I like and what I write, these ideas should extend to other languages with similar functionality.

Notes

The object passing out of generators, between threads, etc. is not optimal. There may also be some other subtle bugs in the snippets. I wrote this in one go without a compiler/linter and so couldn’t be bothered to optimize all the code.