CSCI.4210 Operating Systems Fall, 2009, Class 15

CSCI.4210 Operating Systems Fall, 2009 Class 15
Deadlock

When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone. (Statute passed by the Kansas Legislature)

Recall that one definition of an operating system is a resource allocator. There are many resources that can be allocated to only one process at a time, and we have seen several operating system features that allow this, such as mutexes, semaphores or file locks.

Sometimes a process has to reserve more than one resource. For example, a process which copies files from one tape to another generally requires two tape drives. A process which deals with databases may need to lock multiple records in a database.

In general, resources allocated to a process are not preemptable; this means that once a resource has been allocated to a process, there is no simple mechanism by which the system can take the resource back from the process unless the process voluntarily gives it up or the system administrator kills the process. This can lead to a situation called deadlock. A set of processes or threads is deadlocked when each process or thread is waiting for a resource to be freed which is controlled by another process. Here is an example of a situation where deadlock can occur.

Mutex M1, M2;

/* Thread 1 */
while (1) {
   NonCriticalSection()
   Mutex_lock(&M1);
   Mutex_lock(&M2);
   CriticalSection();
   Mutex_unlock(&M2);
   Mutex_unlock(&M1);
}

/* Thread 2 */
while (1) {
   NonCriticalSection()
   Mutex_lock(&M2);
   Mutex_lock(&M1);
   CriticalSection();
   Mutex_unlock(&M1);
   Mutex_unlock(&M2);
}

Suppose thread 1 is running and locks M1, but before it can lock M2, it is interrupted. Thread 2 starts running; it locks M2, when it tries to obtain and lock M1, it is blocked because M1 is already locked (by thread 1). Eventually thread 1 starts running again, and it tries to obtain and lock M2, but it is blocked because M2 is already locked by thread 2. Both threads are blocked; each is waiting for an event which will never occur.

Traffic gridlock is an everyday example of a deadlock situation.

In order for deadlock to occur, four conditions must be true.

Mutual exclusion - Each resource is either currently allocated to exactly one process or it is available. (Two processes cannot simultaneously control the same resource or be in their critical section).
Hold and Wait - processes currently holding resources can request new resources
No preemption - Once a process holds a resource, it cannot be taken away by another process or the kernel.
Circular wait - Each process is waiting to obtain a resource which is held by another process.

The dining philosophers problem discussed in an earlier section is a classic example of deadlock. Each philosopher picks up his or her left fork and waits for the right fork to become available, but it never does.

Deadlock can be modeled with a directed graph. In a deadlock graph, vertices represent either processes (circles) or resources (squares). A process which has acquired a resource is shown with an arrow (edge) from the resource to the process. A process which has requested a resource which has not yet been assigned to it is modeled with an arrow from the process to the resource. If these create a cycle, there is deadlock.

The deadlock situation in the above code can be modeled like this.

This graph shows an extremely simple deadlock situation, but it is also possible for a more complex situation to create deadlock. Here is an example of deadlock with four processes and four resources.

There are a number of ways that deadlock can occur in an operating situation. We have seen some examples, here are two more.

Two processes need to lock two files, the first process locks one file the second process locks the other, and each waits for the other to free up the locked file.
Two processes want to write a file to a print spool area at the same time and both start writing. However, the print spool area is of fixed size, and it fills up before either process finishes writing its file, so both wait for more space to become available.

Solutions to deadlock

There are several ways to address the problem of deadlock in an operating system.

Just ignore it and hope it doesn't happen
Detection and recovery - if it happens, take action
Dynamic avoidance by careful resource allocation. Check to see if a resource can be granted, and if granting it will cause deadlock, don't grant it.
Prevention - change the rules

Ignore deadlock

The text refers to this as the Ostrich Algorithm (with apologies to ostriches). Just hope that deadlock doesn't happen. In general, this is a reasonable strategy. Deadlock is unlikely to occur very often; a system can run for years without deadlock occurring. If the operating system has a deadlock prevention or detection system in place, this will have a negative impact on performance (slow the system down) because whenever a process or thread requests a resource, the system will have to check whether granting this request could cause a potential deadlock situation.

If deadlock does occur, it may be necessary to bring the system down, or at least manually kill a number of processes, but even that is not an extreme solution in most situations.

Deadlock detection and recovery

As we saw above, if there is only one instance of each resource, it is possible to detect deadlock by constructing a resource allocation/request graph and checking for cycles. Graph theorists have developed a number of algorithms to detect cycles in a graph. The book discusses one of these. It uses only one data structure L a list of nodes.

A cycle detection algorithm

For each node N in the graph

Initialize L to the empty list and designate all edges as unmarked
Add the current node to L and check to see if it appears twice. If it does, there is a cycle in the graph.
From the given node, check to see if there are any unmarked outgoing edges. If yes, go to the next step, if no, skip the next step
Pick an unmarked edge, mark it, then follow it to the new current node and go to step 3.
We have reached a dead end. Go back to the previous node and make that the current node. If the current node is the starting Node and there are no unmarked edges, there are no cycles in the graph. Otherwise, go to step 3.

Let's work through an example with five processes and five resources. Here is the resource request/allocation graph.

The algorithm needs to search each node; let's start at node P1. We add P1 to L and follow the only edge to R1, marking that edge. R1 is now the current node so we add that to L, checking to confirm that it is not already in L. We then follow the unmarked edge to P2, marking the edge, and making P2 the current node. We add P2 to L, checking to make sure that it is not already in L, and follow the edge to R2. This makes R2 the current node, so we add it to L, checking to make sure that it is not already there. We are now at a dead end so we back up, making P2 the current node again. There are no more unmarked edges from P2 so we back up yet again, making R1 the current node. There are no more unmarked edges from R1 so we back up yet again, making P1 the current node. Since there are no more unmarked edges from P1 and since this was our starting point, we are through with this node (and all of the nodes visited so far).

We move to the next unvisited node P3, and initialize L to empty. We first follow the unmarked edge to R1, putting R1 on L. Continuing, we make P2 the current node and then R2. Since we are at a dead end, we repeatedly back up until P3 becomes the current node again.

L now contains P3, R1, P2, and R2. P3 is the current node, and it has another unmarked edge to R3. We make R3 the current node, add it to L, follow its edge to P4. We repeat this process, visiting R4, then P5, then R5, then P3. When we visit P3 again we note that it is already on L, so we have detected a cycle, meaning that there is a deadlock situation.

Once deadlock has been detected, it is not clear what the system should do to correct the situation. There are three strategies.

Preemption - we can take an already allocated resource away from a process and give it to another process. This can present problems. Suppose the resource is a printer and a print job is half completed. It is often difficult to restart such a job without completely starting over.
Rollback - In situations where deadlock is a real possibility, the system can periodically make a record of the state of each process (a checkpoint), and when deadlock occurs roll everything back to the last checkpoint, and restart, but allocating resources differently so that deadlock does not occur. This means that all work done after the checkpoint is lost and will have to be redone.
Kill one or more processes - this is the simplest and crudest, but it works.

Deadlock avoidance

The above solution allowed deadlock to happen, then detected that deadlock had occurred and tried to fix the problem after the fact. Another solution is to avoid deadlock by only granting resources if granting them cannot result in a deadlock situation later.

Processes often do not make all of their requests at once. They get a resource, do something with it for a while, request another resource, maybe release a resource etc.

We can avoid deadlock by checking to see if we are in a safe state. A state is safe if there is some scheduling order in which can every process can run to completion, even if all of them suddenly request all their resources.

An unsafe state is one which does not have this feature. Note that an unsafe state does not necessarily mean deadlock because a particular process may run to completion before another process actually makes all of its requests.

The Banker's algorithm - only allocate a resource if it would lead to a safe state.

Here is a banking example. A bank has granted each of four customers a line of credit. Customers do not necessarily use all of the credit at once, they will make a request, for some of it, perhaps pay off some of it later, etc.

In this example, the banker knows that all customers will not ask for all of their credit so it only has 10 units available to lend even though the sum of all of the available credits is 22 units.

Aside This is a good example of mixed metaphors (Bankers lending money and processes requesting resources) and you have to flip back and forth between these to follow the logic

Customer Has Max
A 1 6

B 1 5

C 2 4

D 4 7

Customer	Has	Max
A	1	6
B	1	5
C	2	4
D	4	7

This is a safe state, because with two units left, the Banker can delay any requests except those from C. C will eventually run to completion, free up its resources (paying off the loan), and making more money available for others.

Customer Has Max
A 1 6

B 2 5

C 2 4

D 4 7

Customer	Has	Max
A	1	6
B	2	5
C	2	4
D	4	7

This is an unsafe state, because if everyone requests the max, the Banker cannot give enough additional units to any of the customers in order to let them run to completion.

However, this works only if the system knows what requests for resources a process will be making in the future, and this is an unrealistic assumption. The text describes the bankers algorithm but then points out that it is essentially impossible to implement because of this assumption.

Deadlock Prevention

The difference between deadlock avoidance and deadlock prevention is a little subtle. Deadlock avoidance refers to a strategy where whenever a resource is requested, it is only granted if it cannot result in deadlock. Deadlock prevention strategies involve changing the rules so that processes will not make requests that could result in deadlock.

Recall the four conditions for deadlock

Mutual exclusion - Each resource is either currently allocated to exactly one process or it is available. (Two processes cannot simultaneously control the same resource or be in their critical section).
Hold and Wait - processes currently holding resources can request new resources
No preemption - Once a process holds a resource, it cannot be taken away by another process or the kernel.
Circular wait - Each process is waiting to obtain a resource which is held by another process.

If we can change the rules of allocating resources so that one of these conditions is never true, we can prevent deadlock.

Attacking the mutual exclusion provision does not hold much promise.

We can address the hold and wait condition by requiring all processes to make all of their requests when they start, and only allocate them if all are available. This works but is insanely inefficient.

A variant of this which holds more promise is that whenever a process makes a new request, it has to temporarily release all of the resources that it currently holds.

Eliminating the No Preemption requirement is hard to implement in practice.

Attacking the circular wait condition holds more promise under some circumstances. Here is one way to do it.

Suppose every possible resource is numbered (easy enough in theory, but often hard in practice), and processes must make their requests in order; that is, they cannot request a resource with a number lower than any of the resources that they have been granted so far.

This prevents deadlock. Here is an informal proof. At any instant, one of the assigned resources will be highest. The process holding this resource will not request a resource already assigned (although it may request a higher numbered resource) and therefore it will run to completion and release its resources. This will allow some other process to run to completion.

As an example, consider the dining philosophers problem. Suppose each chopstick is numbered, and philosophers always have to pick up the lower numbered chopstick before the higher numbered chopstick. Philosopher five picks up chopstick 4, philosopher 4 picks up chopstick 3, philosopher 3 picks up chopstick 2, philosopher 2 picks up chopstick 1. Philosopher 1 is hungry, and without this assumption, would pick up chopstick 5, thus causing deadlock. However, if the lower number rule is in effect, he/she has to pick up chopstick 1 first, and it is already in use, so he/she is blocked. Philosopher 5 picks up chopstick 5, eats, and puts both down, allows philosopher 4 to eat. Eventually everyone gets to eat.

Livelock

There is a variant of deadlock called livelock. This is a situation in which two or more processes continuously change their state in response to changes in the other process(es) without doing any useful work. This is similar to deadlock in that no progress is made but differs in that neither process is blocked or waiting for anything.

A human example of livelock would be two people who meet face-to-face in a corridor and each moves aside to let the other pass, but they end up swaying from side to side without making any progress because they always move the same way at the same time.

Addressing deadlock in real systems

Deadlock is a terrific theoretical problem for graduate students, but none of the solutions discussed above can be implemented in a real world, general purpose operating system. It would be difficult to require a user program to make requests for resources in a certain way or in a certain order. As a result, most operating systems use the ostrich algorithm.

Some specialized systems have deadlock avoidance/prevention mechanisms. For example, many database operations involve locking several records, and this can result in deadlock, so database software often has a deadlock prevention algorithm.

The Unix file locking system lockf has a deadlock detection mechanism built into it. Whenever a process attempts to lock a file or a record of a file, the operating system checks to see if that process has locked other files or records, and if it has, it uses a graph algorithm similar to the one discussed above to see if granting that request will cause deadlock, and if it does, the request for the lock will fail, and the lockf system call will return and errno will be set to EDEADLK.

Communication Deadlock

Another variant of deadlock which is very common is Communication Deadlock. Many communication protocols involve one side sending a request to the other side, and the other side replies. If the request is somehow lost, then both sides are left waiting for an event which will never occur.

The simple and obvious solution to this is that when either side sends a message to the other, it sets a timer, and if after a while, no acknowledgement is received, the message is sent again.

Return to the course home page

CSCI.4210 Operating Systems Fall, 2009 Class 15 Deadlock

CSCI.4210 Operating Systems Fall, 2009 Class 15
Deadlock