A few months ago, I stumbled upon tinyio, a Python package by Patrick Kidger implementing an event loop for Python in few hundred lines. I was impressed: in part because of my own experience in implementing event loops for Python1, but mainly because of the simplicity of the project design.
My excitement lowered a bunch the moment I discovered this project didn't provide any primitive to actually deal with I/O, but it surely demonstrated it's not that hard to reimplement an async runtime from scratch in Python.
The existence of tinyio confirmed to me that it was perfectly possible to get away from AsyncIO, its sharp corners and limitations, and do things differently, a sentiment that was growing in me since the very first releases of RLoop.
There are some very good articles telling the frustrations and sharp corners of asyncio, that can probably give you a much more complete answer to the question "what's wrong with AsyncIO", like:
And while I share a lot of thoughts with those authors, my main problems with AsyncIO can probably be resumed into the following points:
To re-iterate a bit on the design decisions, in my last year talk at PyconIT I presented a slide asking the audience about the difference between AsyncIO's protocols and transports. Nobody could tell. Can you? The official Python docs try to explain this difference in three different ways. And, despite the fact none of them sounds particularly satisfying to me, in my experience when you need three different ways to explain something, it's not a good sign. To bring even more confusion on the table: AsyncIO protocols and transports have the same lifecycle, so you need both of them, all the time, to handle a single socket communication. Why? Nobody really seems to know.
You might see why, then, I really liked the design of tinyio . It is simple.
And you might also see why, with the release of free-threaded Python, I started questioning: why should I care about the main thread at all?
As a developer, you're probably familiar with the "Reinventing the Wheel" argument. While I agree that when you try do rebuild something from scratch you often end up with something that has the same flaws and negatives of the original thing, I'm also quite a big fan of the following quote:
"The reasonable man adapts himself to the world around him; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.”
― George Bernard Shaw
So, whenever I can, I try to be unreasonable. And in this specific context I totally can: I'm not bound to business decisions, I don't need to motivate the time this will require. It's my free time, and I can learn a lot from doing this.
Call me a boomer, but I wish for more people being unreasonable these days, especially considering we're in the productivity era empowered by "AI", where the narrative seems to be centered around the fact you don't need to know, or to learn, Claude can do this.
Well, f*ck you Claude. I'm going to actually code this sht. I won't just write down a markdown file or write in an input box "write an async multithreaded runtime for Python, make no mistakes". I'm going to *use my brain, to actually think about this stuff and to use my fingers to type on the keyboard. And in a month, a year or a decade from now, I will be able to explain what the hell is my code doing, and why.
In thinking how to build an async runtime, the first step should be to think what actually async code is all about. And – to my perspective – async code it's really all about suspension.
Whenever you type await or yield in your code, your purpose is to tell the computer "yo, at this point suspend whatever is happening and do something else". You don't really think about coroutines, and futures, and tasks, and whatever the hell is happening behind those keywords.
And, if it's all about suspension, it's also all about resumption. We want to suspend our code until something happens, and then have our code execution to resume from where we left. Thus, we can say an async runtime it's something that let you suspend your code and resume the execution on specific events. A simple design should then probably be centered about the concept of an event.
And this is why in TonIO – yep, this is the name I picked for this runtime – everything is build around the Event class, which is literally a wrapper around an atomic flag2:
class Event:
flag: bool
So that whenever we want to suspend the execution waiting for something, that something is actually an Event object, and once that something happens, we flip the flag and resume execution. But to do that, we need an additional piece to represent the actual suspension point. Or, in other words, we need something that represent the fact we're waiting on a an event. Let's call it a waiter.
With a Waiter object, we can now link together the suspension point, and the event such suspension point will wait for:
class Event:
flag: bool = False
waiters: list[Waiter] = []
def wait(self) -> Waiter:
return Waiter(self)
def set(self):
self.flag = True
for waiter in self.waiters:
waiter.wake()
class Waiter:
event: Event
origin: Coroutine
def register(self, origin):
self.origin = origin
self.event.waiters.append(self)
def wake(self):
self.origin.resume()
Now, we "only" need to write the code for our event loop in a way such as whenever a Waiter object gets yielded by a coroutine, we "register" the waiter and store the suspension point. Then, whenever we "wake" a waiter, we simply schedule another iteration of the coroutine into the loop.
That's basically it. No futures, no tasks, no other primitives other than an event, and a waiter for that event. Of course there's a lot more to consider and add here to actually get a fully working runtime (cancellations, timeouts, control flow, etc.), but this basic design is more than enough to handle async code. TonIO is designed exactly this way, except is not written in Python, but Rust, for good reasons.
One thing I always liked of the async runtimes in Rust, is the overall simplicity into using them:
use mini_redis::{client, Result};
#[tokio::main]
async fn main() -> Result<()> {
let mut client = client::connect("127.0.0.1:6379").await?;
client.set("hello", "world".into()).await?;
let result = client.get("hello").await?;
println!("got value from the server: {:?}", result);
Ok(())
}
With a single line (#[tokio::main]) you will have a working multi-threaded async runtime setup, and your code will just run. You don't need to worry about threads, special threads (like the main one), or configuring things.
I wanted the same experience in Python:
import tonio.colored as tonio
@tonio.main
async def main():
client = await client.connect("127.0.0.1:6379")
await client.set("hello", "world")
result = await client.get("hello")
print(f"got value from the server: {result}")
main()
So, like in Rust's Tokio, just with one line – a decorator in this case – your async code will just run in a multi-threaded fashion, with a concurrency matching the amount of cores the machine has. You want to run an HTTP server? No need to worry about processes, or threads, or similar configuration hassles, just write your main function and let the runtime take care of the rest.
Now, of course all of the above works only if we don't have the GIL around. But remember: that's what we wanted to change in the first place. We accept to target only the very small percentage of code out there nowadays that is free-threaded Python compatible, and to exclude the vast majority of existing Python async code to be compatible with our runtime. But given our runtime is not compatible with any existing async standard in the Python ecosystem – our runtime primivitives are just the coroutines and events, we don't rely with AsyncIO's futures or tasks or any other primitive – it's probably not a big deal: a big portion of existing async code would require to be rewritten in any case.
With the GIL out of our way, we can stop worrying about the main thread, as in this design the code runs on one of many available threads. And to simplify the design even further, we can just avoid to run any "application code" under the main thread. In fact, this is how TonIO is designed: the Python main thread is used only to deal with low-level I/O primitives. Or, in other words, we can say we have the actual event loop running in the main thread, but instead of using such loop to also run coroutines, we use a separated threadpool to run that code. No more worries about the main thread versus everything else: you simply can't run anything on that thread. All of your code will just run within identical, generic, non-special threads.
The choice of using Rust to implement the internals of our runtime now plays a much more decisive role now, given we're in a multi-threaded context. The Rust language provides a bunch of useful primitives to deal with multi-threaded code: for instance, in the prior section, we described the Event object to be a wrapper around an atomic flag, but when we represented that object in pseudo-Python code, we noted that as a boolean. Rust provides an atomic module with a bunch of useful primitives, like an actual atomic boolean flag: we can use these primitives without re-implementing everything from scratch, and make Python code to interact with those primitives through wrappers.
So, now that we have our primitives and the actual event loop implemented, a question arises: is it any good?
This shouldn't be a surprise. We removed a bunch of unneeded complexity from AsyncIO, thus we need to do less work. Or, more appropriately, we don't need to waste a bunch of time working on the overhead of the abstractions while doing the same amount of work, which is the actual code contained into the coroutines.
How much overhead? you might ask. Well, by running the simplest benchmark possible to measure that, like running one million coroutines which compute a simple power, we can verify to be two to three and a half times faster than AsyncIO. Running our multi-threaded runtime with just 1 thread. And just because of simple and less primitives.
| Runtime | Total execution time | Relative performance |
|---|---|---|
| AsyncIO | 2937.958ms | (1.0x) |
| TonIO (yield syntax) | 862.389ms | (3.41x) |
| TonIO (async syntax) | 1255.214ms | (2.34x) |
But what about actual I/O? you might ask. We can setup a 10KB TCP echo benchmark for this, and we can measure similar performance improvements:
| Runtime | Throughput (RPS) | Relative performance |
|---|---|---|
| AsyncIO | 47188.9 | (1.0x) |
| TonIO (async syntax) | 97723.4 | (2.07x) |
| TonIO (async syntax), 2 threads | 157012.3 | (3.33x) |
and remember: we're running TonIO with a very low amount of threads here.
Isn't impressive how reinventing the wheel can, out of the box, more than double the performance of Python's async code?
While the above results look quite promising, TonIO is in its very early days, and there's plenty of stuff to be done. The hardest part? Promote adoption.
At the time of writing, TonIO lacks high-level abstractions on network primitives, as it only implements a socket like interface to perform network operations. A streams API is already planned for the near future, and it will probably look similar to what Trio does, to lower down as much as possible the burden of migrating existing code to TonIO.
But that would be just the first enabler to facilitate TonIO adoption, and I'm already thinking on releasing a package or some utilities to monkey patch existing popular libraries and make them work out of the box with TonIO, things like httpx, redis, asyncpg and others. I'm also doing some experiments on servers like Granian – another project I maintain – or uvicorn, to find the best way to support TonIO usage in the existing async ecosystem. Possibly, in the future, projects like AnyIO will support TonIO out of the box.
Regardless of what the future will look like, I'm happy that I was – once again – the unreasonable man. That I reinvented the wheel, even if it looked like a bad idea for numerous reasons.
I'm happy that I pursued this in the era where the mainstream narrative seems to suggest we don't need to think what to invent, or we don't need to think at all about code. And I hope this can inspire all those junior developers entering the market right now to try to build things, not just ship things, learning about low-level parts of the stack, facing in first person the problems, issues and trade-offs different people experienced before. I started my career by writing my own web framework, ORM, templating engine. 15 years later, I'm still reinventing the wheel, because there's no limit on what you can learn out there.
We don't need more regurgitated code clones in these days. We need more well thought, clearly designed, novelty code that aims to improve the status quo. Be unreasonable, whenever you can.
see the RLoop project, my first – not yet completely successful – attempt to make AsyncIO faster. ↩
most of the code snippets are for demonstration purposes, TonIO is actually written in Rust. ↩