Concurrency with Python#
There are different packages in the standard library of Python to run things concurrently. In this blog post, I’ll try to explain the different packages and when it’s best to use them.
Note
After writing this blog post, I discovered https://superfastpython.com/ To be honest, it’s a way better resource about concurrency in Python in general. For the particular topic of this blog post, I would recommend reading https://superfastpython.com/python-concurrency-choose-api/
subprocess#
The subprocess is for calling other programs. When using this package, you’re basically starting another specialized program to handle a task.
The operating system decides how these subprocesses are run. If you have multiple CPU’s, these subprocesses might run in parallel.
Although the performance is affected by loading and running other programs and there are ways to exchange data between processes, it’s also a more secure way to keep process and memory space separate between different specialized programs.
multiprocessing#
This is a package
that allows python code to be executed as a system process (almost like the
ones in a subprocess
described above), bypassing the Global Interpreter
Lock, not (easily) sharing memory with runs that are managed by the operating
system.
This is handy when you have a problem that is CPU-bound where you want the calculation to be done by all the CPU’s you have.
threading#
Threading is for running multiple threads in the same application (or process space) and basically sharing memory. It loads faster and has less overhead compared to the processes described above.
However, even though it’s managed by the operating system, the Global Interpreter Lock allows only one thread to execute Python bytecode by the Python interpreter.
The Global Interpreter Lock is released when doing I/O operations. That makes threading still handy for I/O-bound problems were your program is waiting most of the time for communicating with slow devices, like a network connection, a hard drive or a printer.
Keep in mind that a thread is part of a process which it can crash.
asyncio#
This is a library to write concurrent code using the async/await pattern. The event loop runs asynchronous tasks and callbacks, perform network IO operations and run subprocesses. It’s a single-threaded, single-process construct that helps with I/O bound applications.
The big difference here is that the context switching between tasks is actively managed by the code written by developers. By yielding or returning from coroutines, the control of the thread is given back to the caller. This is called “coorporative multitasking” or “non-preemptive multitasking”. All the other techniques above are examples of preemptive multitasking.
When it comes down to I/O bound applications: Use asyncio where you can, threading when you must.