Why does my multiprocessing python program hang?

Help! My Python program hangs

Have you ever had a Python program that just won't behave? You run it, you wait, you wait some more... and nothing happens. It's like the code has a mind of its own and refuses to cooperate.

Well, the other day at work, my colleague Bob came up to me with one such program. It was a multiprocessing script that just wouldn't finish. Bob was pulling his hair out, wondering what was going on.

I took a look at Bob's code, scratched my head for a moment, and then it hit me. It was all to do with Python's multiprocessing queues. And that's what I want to talk to you about today.

But before we dive into the nitty-gritty, let me show you the code that was causing all the trouble.


from multiprocessing import Process, Queue

def worker(q: Queue):
    data = '1' * 105656
    q.put(data, block=False)
    print(f"worker ended")

def main():
    q = Queue()
    p = Process(target=worker, args=(q,))
    p.start()
    print("worker started")
    # q.get()
    p.join()
    print("worker joined")
    print('all done')

if __name__ == "__main__":
    main()

Why multiprocessing?

If you've ever worked on a CPU-bound Python program, you may have heard of the Global Interpreter Lock (GIL). It's a mechanism that prevents multiple threads from executing Python byte codes simultaneously, which can limit performance on multi-core CPUs.

One way to work around the GIL is to use multiprocessing, which allows you to execute Python code across multiple processes instead of threads. The multiprocessing library in Python makes it easy to write code that can take advantage of multiple CPU cores, and it's relatively simple to use.

Bob was so excited about his new multiprocessing program. He had heard that multiprocessing could harness the power of multiple cores to make his program run faster, and he couldn't wait to try it out.

But when he ran the code, here was the output he got:


worker started
worker ended

Judging from the printed messages, the child process had finished and exited. But the main program just hung there, refusing to exit. He waited and waited, but nothing happened. (You are encouraged to try it out and see for yourself!) Bob was stumped. What could be the problem? He checked the code, but it looked perfectly fine. He scratched his head, puzzled.

And the strange part is, if you pop the message out before calling p.join(), the program exits normally.

Spoiler alert: You should alway empty a queue before exiting a program.

The investigation

As if the previous problem wasn't enough, Bob noticed another strange behavior with this program. It would exit normally if he sent small message data, but if he sent a larger message, it would just hang there, as if lost in space. He turned to me for help. To pin down the exact data size that would not block the program, I ran the following script


from multiprocessing import Process, Queue
import time

def worker(q: Queue, data_size: int):
    # data = [42] * data_size               # max data_size: 32726
    # data = [3.14] * data_size             # max data_size: 7278
    # data = [True] * data_size             # max data_size: 65386
    data = '1' * data_size                # max data_size: 65514
    q.put(data, block=False)

def main():
    q = Queue()

    right = 105656
    left = 0
    while True:
        if left >= right - 1:
            break

        data_size = (left + right) // 2
        print(f"data size: {data_size}")
        p = Process(target=worker, args=(q, data_size))
        p.start()
        time.sleep(0.1)

        if p.is_alive():
            # it hangs
            right = data_size
        else:
            # it exits
            left = data_size
        q.get()
        p.join()

    print(f'max data_size: {left}')

if __name__ == "__main__":
    main()

The program exited when the data size is less then 65514 which was really close to 65536 or . Coincident? I don't think so. As you can see, I also tried out a few other data types and the maximum data sizes all turn out to be closed to some factors of 65536.

data type	max data size	factor
char	65514	65536 / 65514 = 1.0003358060872485
integer	32726	65536 / 32726 = 2.0025667664853635
float	7278	65536 / 7278 = 9.004671613080516
boolean	65386	65536 / 65386 = 1.0022940690667728

It's the queues to blame

Ah, the plot thickens! The culprit behind Bob's program hanging was the multiprocessing queue. You see, Python implements the multiprocessing.Queue with pipes on Linux systems. And pipes have a default buffer size of 16 pages, which equals to 65536 bytes. Writing a message larger than this size will block the writer.

To make things worse, the queue starts a thread to do the actual data writing. The intention, I guess, was to allow codes to continue executing without waiting on the slow pipe io write operation. But now joining this thread becomes part of the cleaning process. In Bob's code, the child process starts the this process after printing out the message at line 6. It calls join on the data writing thread internally but this thread won't quit because it still has data to write. This join call is what blocks the child process and consequently blocks the main process from exiting.

Emptying the queue by calling q.get() essentially removes this data from the buffer. It allows the writer to finish writing the data to the underlying pipe and then exit. That's why the child process join returns if you pop the message out before calling it.

But why does the program hangs at 65514 instead of 65536 for char data type? Well, after some digging into the source codes, here's what I found,

Before the data writing thread actually writes the data, it first needs to tell the receiving end how many bytes of data to be expected. This information normally takes up 4 bytes.

All data is serialized before being written to the pipe. Pickle is the default serializer to be used.

Here is a script to confirm my finding


from typing import Any
import pickle as pkl
import struct

def get_raw_data(data: Any) -> bytes:
    '''
    This is a simplified version of the data preparing process.
    For more information, checkout
    multiprocessing.connection.Connection._send_bytes
    '''
    serialized_data = pkl.dumps(data)
    header = struct.pack("!i", len(serialized_data))
    return header + serialized_data

if __name__ == "__main__":
    char_data = '1' * 65514
    int_data = [42] * 32726
    float_data = [3.14] * 7278
    bool_data = [True] * 65386

    print(f"char data byte size: {len(get_raw_data(char_data))}")
    print(f"integer data byte size: {len(get_raw_data(int_data))}")
    print(f"float data byte size: {len(get_raw_data(float_data))}")
    print(f"boolean data byte size: {len(get_raw_data(bool_data))}")

It prints out


char data byte size: 65536
integer data byte size: 65536
float data byte size: 65536
boolean data byte size: 65536

Case closed.

The take away lesson

When using multiprocessing in Python, you need to be aware of how multiprocessing queues work. Multiprocessing queues can cause your program to hang if you don't handle them properly, and this can be particularly tricky to debug.

One important thing to remember is that you should always empty the queue before exiting your program. If there are any items left in the queue, your program may not exit cleanly and can hang indefinitely.

Another thing to keep in mind is that when putting an object onto a multiprocessing queue, the object might not actually leave the process memory space even though the function call has returned. This can lead to out-of-memory errors that are difficult to diagnose.