Why Deadlock if put sending Terminate message and thread.join() in one loop in Rust book 20.3?

0

Why Deadlock if put sending Terminate message and thread.join() in one loop?

In book of The Rust Programming Language 20.3 Graceful Shutdown and Cleanup https://doc.rust-lang.org/book/ch20-03-graceful-shutdown-and-cleanup.html

Here is the output for below code that could cause deadlock.

Running target\debug\main.exe

Worker 0 got a job; running.

Shutting down. Sending terminate message to all workers. Shutting down all workers.

Shutting down worker 0

Worker 1 got a job; running.

Worker 1 was told to terminate.

error: process didn't exit successfully: target\debug\main.exe (exit code: 0xc000013a, STATUS_CONTROL_C_EXIT) ^C

        for worker in &mut self.workers {

            self.sender.send(Message::Terminate).unwrap();

            println!("Shutting down worker {}", worker.id);
            if let Some(thread) = worker.thread.take() {
                thread.join().unwrap();
            }
        }

Could you help me understand why this coding logic could cause a Deadlock?

""" To better understand why we need two separate loops, imagine a scenario with two workers. If we used a single loop to iterate through each worker, on the first iteration a terminate message would be sent down the channel and join called on the first worker’s thread. If that first worker was busy processing a request at that moment, the second worker would pick up the terminate message from the channel and shut down. We would be left waiting on the first worker to shut down, but it never would because the second thread picked up the terminate message. Deadlock! """

I was thinking, after first worker finish task, the first worker will get next terminate message, and then break the loop.

Will thread.join() prevent first worker from accepting new message from channel? It seems not.

Here is my understanding of the logic and steps:

on the first iteration a terminate message would be sent down the channel,

first worker was busy processing a request at that moment. 2nd worker get a terminate message, and exit the loop. first worker thread.join() to main. - first worker thread is moved out of ThreadPool, by worker.thread.take(), leave the worker.thread as Option::None;

Now there is no worker.thread in ThreadPool::drop.

on the second iteration, fn ThreadPool::drop() send another terminate message down the channel,

there is no worker.thread to process the message, 2nd worker already exited the loop. then, maybe 2nd worker thread.join() to main().

At the end, the moved first worker thread is in infinite loop. main() is waiting for the thread to end, waiting for ever.

But. there is another thought, even the moved first worker thread is not in ThreadPool, the thread still has the receiver, to receive the terminate message, and then break the loop.

I'm still confusing. ^_^

rust
asked on Stack Overflow Jun 12, 2020 by Charlie 木匠 • edited Jun 12, 2020 by Charlie 木匠

1 Answer

1

The problem is that the second terminate message might never get sent. When we call thread.join().unwrap();, we wait until that thread finishes before continuing. So if the first thread never terminates (because the second worker got the termination message), then we'll never progress past thread.join().unwrap(); in the first iteration of the loop.

Think about this possible sequence of events.

  1. (thread 1) Worker 1 starts a job.
  2. (thread 2) Worker 2 checks for a message (nothing).
  3. (main thread) Termination message is sent.
  4. (thread 2) Worker 2 checks for a message (termination message).
  5. (thread 2) Thread 2 ends.
  6. (thread 1) Worker 1's job ends.
  7. (thread 1) Worker 1 checks for a message (nothing).
  8. (main thread) Thread 1 is joined (main thread is now just waiting).
  9. (thread 1) Worker 1 checks for a message (nothing).
  10. (thread 1) Worker 1 checks for a message (nothing).
  11. ... (deadlock)

Worker 1 will never get a message because the only messages being sent are those in the main thread. But the main thread is waiting for thread 1 to finish. That's the definition of deadlock. Thread 1 won't finish until the main thread sends a termination message, and the main thread won't send a termination message until thread 1 finishes.

This doesn't have anything to do with whether the thread is in the threadpool or not. Yes, thread 1 is no longer in the threadpool after worker.thread.take(), but the thread still exists and the main thread is still waiting for it.

answered on Stack Overflow Jun 12, 2020 by SCappella

User contributions licensed under CC BY-SA 3.0