Beyond the Interpreter: Building a Suspendable Async JIT for Python 3.13
Most JIT compilers treat Python as a linear stream of instructions. But once you enter the world of async/await, Python is no longer just a function—it’s a complex state machine. In the latest version of JustJIT, I’ve moved beyond simple "Object Mode" execution to solve the Async-JIT Paradox: maintaining native machine code performance across non-blocking suspension points.
The Architecture: Continuation Passing in LLVM
When a coroutine hits an await, the native CPU stack is typically destroyed as control returns to the event loop. To solve this, JustJIT implements a CPS (Continuation-Passing-Style) transformation. We break the function into discrete LLVM basic blocks and use a persistent, heap-allocated frame to store local variables like acc and NumPy pointers.
At the entry of the JITed code, we use a high-speed LLVM switch on a %state variable. This allows the JIT to "teleport" execution directly back into the heart of a loop—exactly where it left off—without re-running the function prologue.
Handling the "Final Boss": Async Exception Unwinding
The true test of a JIT isn't just speed; it’s reliability. In an async environment, a task can be cancelled at any suspension point. If the JIT doesn't understand the Python Exception Table, a task.cancel() would result in a hard segfault or a leaked reference.
JustJIT proactively parses the 3.13 Exception Table to map bytecode offsets to native Landing Pads. During our stress tests, when a CancelledError was injected, the JIT successfully detected the error state upon resumption, unwound the internal stack, and jumped to the correct except handler. This ensures that even in the event of failure, memory-sensitive objects like NumPy arrays are safely cleaned up.
Results: Stability Meets Compatibility
In our recent "Stress Case Zero," JustJIT maintained a
perfect state through 1,000+ suspensions while interacting with external NumPy buffers. While we are currently operating in "Object Mode"—matching CPython's speed—the architecture is now verified and robust. We have successfully bridged the gap between a high-level, reference-counted event loop and native LLVM execution.
