- cross-posted to:
- python@programming.dev
- technews@radiation.party
- cross-posted to:
- python@programming.dev
- technews@radiation.party
Meta is dedicating 3 engineers to get the nogil patches into cpython. There are some other companies stepping up as well. This is huge this is the closest we have ever been to solving the issue of the GIL.
As a filthy casual, could anyone give me a link or brief summary as to why the GIL should/shouldn’t go away?
Basically there is no parallel execution of python code within a single process as long as the global interpreter lock exists. It prevents more than one python thread from running. However, many of the major libraries used in python call out to libraries that can and do release the gil to run native code in parallel.
The GIL only executes one thread at a time. A python program can be multithreaded, but only only thread runs in CPython at a time. If one thread does a system call (like copying a file), then when the python thread is sleeping, the system call can still run in the OS, so there are situations where multithreading can speed up Python programs, even running one thread at a time.
You can run multiple instances of CPython, which is called multiprocessing, and each instance will run one python thread at a time. With different memory space, so all process communication has to be handled manually (afaik, by definition, threads share the same memory space, processes do not).
Any library calls not written in Python don’t run in the interpreter, so most common critical things aren’t limited too badly. For example, I install a NumPy and SciPy library which are compiled against Intel’s MKL library. Any NumPy operations execute in MKL, not the Python interpreter, so are almost as fast as writing the program in C and compiling against MKL myself. And I can write Python and NumPy code about 10x faster than C/MKL. And if I’m on a computer that doesn’t have MKL, I can install a different NumPy library and it will execute just fine without changing the code.
There’s a book called “high performance Python” that helped me figure out a lot of this.
Edit: thought I was posting on the grandparents post instead of the parent post. Sorry.
No worries!
This proposal was linked in the page above, and I think the first few sentences do a pretty good job of summarizing the problem.
a simple explanation but not 100% correct is that even if your code is made to run in parallel using threads, it will never use more than 1 core in your computer.
getting rid of the GIL will let it use all the cores in the processor.
the multiprocessing module “solved” this problem by forking processes instead of threads, but it’s not ideal for a lot of workloads.
The GIL is a thread lock. It prevents threads from accessing the same memory space, eliminating race conditions and more importantly, keeping the reference counters correct so that the python garbage collector can correctly free memory (avoiding leaks)