Love letter to generators
Ever since I realized coroutines are now a part of C++, I’ve wanted to understand them and use them. While there are different types of coroutines, the most prolific one is what is called the generator, which I’ve used and abused quite a bit by now.
What’s a generator?
Simply (and technically incorrectly), a generator is a stateful range generator, much like std::views::iota, but one is defined very similarily to a single function that pushes values into an output vector (or list, deque, etc.), thus making it very easy to write and read no matter how complex the logic is. For example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/** Compute count numbers using a vector. */
auto computeVector(int count)
-> std::vector<int>
{
auto result = std::vector<int>{}; // Create result object
for (const int value : std::views::iota(0, count)) {
result.push_back(value); // Push values
}
return result; // Return all values
}
/** Compute count numbers using a generator. */
auto computeGenerator(int count)
-> std::generator<int>
{
// "result" object created by compiler
for (const int value : std::views::iota(0, count)) {
co_yield value; // Push values
}
// Return to mark the end of the range
}
Both functions look very similar and they have very similar interfaces: integer in a range of
integers out. However they have significantly different implementations and implications. For now
the important parts are that the output is a range of values (much like a std::vector) and instead
of calling a push_back()
or other function, we use co_yield
keyword. This act of “pushing
values” using the co_yield
keyword is called yielding a value.
How exactly the class behaves and how it is implemented is outside the scope of this blog.
Type erasure
On several occasions, I’ve been faced with the question of what types do I put in the parameter list of a function. I want to have as broad of an input space as possible and when taking in ranges this boils down to a question of publishing internals for a template or using type erasure. But how does one erase the type of a range while still having a useful range? Standard library classes like std::span are great when they’re applicable, but if all I want is an input range (so iterable from begin to end once) then a span (contiguous range) is too restrictive.
One option is to define an implementing function that takes in some kind of a “next element, please” function as input:
1
2
3
4
5
6
7
8
9
10
void myDoStuffImpl1(const std::function<Type()>& nextElementPlease)
{
for (
auto value = nextElementPlease();
/* ??? */;
value = nextElementPlease()
) {
/*compute*/
}
}
But what goes in the ???
part? I could wrap each value in an optional or use a pointer instead.
Neither of these options is great, the optional part adds more overhead and wrapping, while a
pointer may run into lifetime issues, as anything created in the function is probably not alive by
the time we get to it. Moreover the interface is quite tedious to write as we need to keep track of
the current iterator and the end and stuff all this into function object.
If we take in a generator instead, we can iterate over it with a normal range-for loop and our template wrapper also becomes a bit simpler:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// defined in its own translation unit (source file)
void myDoStuffImpl2(std::generator<Type> generator)
{
for (auto&& value : generator) {
/*compute*/
}
}
// defined in a header
template <std::ranges::range Range>
void myDoStuff(Range&& range)
{
myDoStuffImpl2([&]() -> std::generator<Type> {
co_yield std::ranges::elements_of(std::forward<Range>(range));
}()); // <- note calling the lambda
}
Yes, the interface part is quite ugly, as is tradition for generic template programming in C++. However it eats any range we might want to throw at it and quite happily at that, which means callers need not fight with my interface as I’ll just accept whatever they have. The number of times I’ve had the needed data in a std::set or something and the function I need to call only accepts std::vector is too high. This of course isn’t free, but the performance impact is probably not significant. At least I’ve not run into problems where the use of a generator is the bottleneck.
Fun fact: The implementation of yielding a std::ranges::elements_of
of a range in
std::generator (probably) uses this mechanism of erasing type by wrapping it in a
std::generator.
Flattened call stack
Writing some part of a bigger application is a challenge on its own right. Often there is a need to process a range of objects in two or more steps. Often with user-facing applications the user wants to be able to cancel the processing of this range of objects. Implementing this in a simple way in code tends to pollute my beautiful free functions with integrations to the rest of the system. It would be great if we could have interruptable processing, but not necessarily always on a separate thread (to which the interruption would be to terminate said thread).
As generators are coroutines, they co-operate with the caller (or other coroutines) on who gets to
run and when. In my generator, once I’m done creating a value I can pass the control to the caller
in full. That is, back on to the caller’s stack frame. That is, the caller is back in control and
there is no extra nesting. The traditional method of passing control back to the caller is via a
callback (optionally with client data, though not always). Even with type erasure using
std::function
I still need to know what to accept. Probably something like std::function<bool()>
called keepGoing
. And now I have to remember to call it in places.
But what if I give back control for every value I produce? Then I don’t need to care about what the
cancellation mechanism is, or even if there is any in the first place. Though the process can no
longer be cancelled during the creation of a value, only once a value is produced, but in my
defence there would be a bug in there preventing this anyway, if it even ever were possible. When
writing a generator I only need to care that no lock or anything not guarded by RAII is kept alive
over a suspension point (a co_yield
statement) and any cancellation mechanism (e.g. early return,
exception) will work just fine to cancel my generator.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/** Makes a bunch of values that are passed back as they are generated. */
auto makeValuesGenerator()
-> std::generator<std::string>;
/** Some stateful computation.
* Returns false when no more input is accepted. */
auto compute(std::string accumulator, const std::string& value)
-> bool;
auto processOnlyNecessary()
-> std::string
{
auto result = std::string{};
for (const auto& value : makeValuesGenerator()) {
if (not compute(result, value)) {
break; // The generator is destroyed and its
// stack's destructors are called.
}
}
return result;
}
Bonus points if I was already going to use a generator to reduce memory overhead from O(n) with a vector to O(1), I don’t need to do any changes to my pure and self-contained generator.
Decoupling
The situation that prompted me to write this post is when I was working with a pipeline with two parts: create values from a file and consume them (output to another file). This may sound very simple, but I’ll assure you the logic for both parts is very much nontrivial. It makes sense to split the pipeline strongly in two. A great benefit of this is that I can now run each part in its own thread, for up to 50 % shorter runtime. However I also suspect there may sometimes be bugs that only happen when running my code in multiple threads, so I also need to run them in a single thread.
If I were to use a message queue or something to support the multithreaded setting, I’d have to put a serialization point between the parts and collect the values in its own thread for single-threaded runs, since neither part would support co-operative execution. But with generators they do! And I don’t need to decide on what kind of message queue to use to connect them, as the first part can be a generator and the second part take a generator as parameter:
1
2
3
4
5
6
7
auto createValues(std::filesystem::path& fileIn)
-> std::generator<Value>;
void consumeValues(
const std::filesystem::path& fileOut,
std::generator<Value> values
);
Now I can use any method I want to glue these parts together and choose at runtime, with the same
creating and consuming code running in both cases. Additionally it’s much easier to test now that I
don’t have to mock reading the message queue for serialization. If we assume a magical message
queue class template called Channel
that allows pushing values to the queue and reading in a
range-for loop we can get something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
auto createValues(std::filesystem::path& fileIn)
-> std::generator<Value>;
void consumeValues(
const std::filesystem::path& fileOut,
std::generator<Value> values
);
void runSingleThreaded(
const std::filesystem::path& fileIn,
const std::filesystem::path& fileOut
)
{
consumeValues(fileOut, createValues(fileIn));
}
void runMultithreaded(
const std::filesystem::path& fileIn,
const std::filesystem::path& fileOut
)
{
auto createChannel = Channel<Value>{};
auto ready = std::latch{2};
auto createThread = std::jthread{[&] {
for (const auto& input : createValues(fileIn)) {
createChannel.push(input);
}
createChannel.close();
ready.count_down();
}};
auto convertThread = std::jthread{[&] {
consumeValues(fileOut, [&]() -> std::generator<Value> {
co_yield std::ranges::elements_of(readChannel);
}()
);
ready.count_down();
}};
/* Do some multithread monitoring or someting
* while we wait for the two threads to run. */
ready.wait();
}
The beautiful thing here is that neither createValues
nor consumeValues
know about threading or
channels, all they care about is creating and consuming values, respectively. Of course, a
generator isn’t a silver bullet and just relying on a generator for input results in some ambiguity
when that range ends abruptly.
Concluding remarks
While I’ve yet to find good uses for any other type of coroutine (e.g. task), generators have been wonderful to work with, to the annoyance of my coworkers. While this blog focuses on C++ and more specifically C++20 and later, because that’s what I like and what I write, these ideas should extend to other languages with similar functionality.
Notes
The object passing out of generators, between threads, etc. is not optimal. There may also be some other subtle bugs in the snippets. I wrote this in one go without a compiler/linter and so couldn’t be bothered to optimize all the code.