CSCI.4210 Operating Systems Fall, 2009 Class 6
Thread System Calls

Posix Thread System Calls

The Posix standard has defined a complete set of system calls for threads called pthreads. The pthread function to create a new thread within the same process has the following rather ugly function prototype


#include <pthread.h>

int pthread_create(pthread_t *thread,  const  pthread_attr_t
*attr, void *(*start_routine, void*),void *arg);

This system call has four arguments, the first *thread is a pointer to a pthread_t. This is an opaque data type similar to a windows handle. The calling program may need this value later for thread synchronization. The data type pthread_attr_t allows the calling program to set some of the attibutes of the thread, such as the size of the stack. If this is set to NULL, the thread has default attributes which are appropriate for most purposes.

The third argument start_routine is the name of a function where the newly created thread will start. This function must have the following prototype
void *start_routine(void *arg)
(replacing start_routine with the name of the function.) It must take one argument, a pointer. The last argument is the pointer to the argument.

As is usually the case, the pthread_create call will return a zero if it successfully created the new thread and a negative value if it failed, and if it failed, the external global variable errno will be set to a value which indicates why it failed.

The threads will be running concurrently; which means that they are all running seemingly at the same time, and there is no assurance about which thread will run first. This can create a race condition; in a race condition, two or more events happen at about the same time, and there is no assurance that one event will happen before the other; the order of the events is indeterminate.

One possibility with threads is that the main calling thread can terminate before all of the threads have terminated. Since all threads are running in the same process space, when the main thread terminates, the entire process dies. In fact, if any thread calls exit(), the entire process immediately terminates. This means that it is possible that some of the threads to die before they have completed their work.

There are two other posix thread system calls that help to address this problem. To terminate a single thread without killing the entire process, use the system call
void pthread_exit(void *value_ptr);
The argument is a pointer to a return value, but it can be set to NULL.

The calling thread can wait for a particular thread to terminate with the call
int pthread_join(pthread_t thread, void **value_ptr); The first argument is the thread id that the process is waiting for, and the second is a pointer to the argument passed back by the thread. This can be set to NULL. This function will block until the thread terminates. Note the similarity in function between this function and the waitpid function, which waits for a child to die.

Here is some very simple thread code

/* simplethread.c */
/* to compile on freebsd, append -pthread to the compile
   line to link to the pthread library */
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h> /* for malloc */
#include <unistd.h> /* for sleep */
extern int errno;

void *ThreadRoutine(void *arg)
{
  int a;
  a = *(int *)arg;
  printf("My argument is %d\n",a);
  sleep(a);
  printf("Thread %d woke up\n",a);
  return NULL;
}

int main()
{
  pthread_t threadIds[5];
  int i;
  int *argptr;
  int retval;
  for (i=0;i<5;i++) {
    argptr = (int *)malloc(sizeof(int));
    *argptr = i;
    retval = pthread_create(&threadIds[i], NULL,
             ThreadRoutine, argptr);
    if (retval == 0) 
        printf("Thread %d was created\n",i);
    else
      perror("Error creating thread");
  }
  for (i=0;i<5;i++) {
    retval = pthread_join(threadIds[i],NULL);
    if (retval != 0) perror("Error in pthread_join");
  }
  return 0;
}

Passing arguments to the thread routine requires special attention. The argument is a pointer. It is usually important that the argument to each thread point to different memory, which is why the loop that calls the thread allocates new memory with malloc each time. Also, make sure that the calling function does not change the value of the memory that arg is pointing to after it creates the thread. Because there is a race condition between the calling function and the newly created thread, we do not know whether the thread or the calling function will run first

If you wrote the loop in the obvious (but wrong) way, like this:

  for (i=0;i<5;i++) {
    retval = pthread_create(&threadIds[i], NULL,
             ThreadRoutine, &i);
    ...
the program would run without errors, but the argument to all five threads would probably be the same, rather than different for each because you are passing the same address each time.

Returning a value from a dying thread using pthread_exit() will test your pointer skills. The argument to this function is a void pointer. A void pointer can point to any data type. The second argument to the function pthread_join is a pointer to a pointer to void, which can also point to any particular data type (presumably the same type as the argument to pthread_exit(). When the call to pthread_join returns, the second argument will point to the argument returned by pthread_exit for that thread.

Here is a short program which demonstrates this. It makes copies of files. The names of the files to be copied are passed in as arguments to main (up to a max of 10). A separate thread is created for each file. The name of the copied file is formed by prepending the string Copy_Of_ to the filename (for example, if the filename was myfile, the copy would be called Copy_Of_myfile). The thread returns the number of bytes copied, and the main thread then displays the total number of bytes copied in all of the files.

/* A program that copies files in parallel
   using pthreads */
/* Alert: link to the pthreads library by appending
   -pthread to the compile statement */
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <pthread.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>

/* pthread_t copy_tid;*/

extern int errno;
#define BUFSIZE   256

void *copy_file(void *arg)
{
    int infile, outfile;
    int bytes_read = 0;
    int bytes_written = 0;
    char buffer[BUFSIZE];
    char outfilename[128];
    int *ret;

    ret = (int *)malloc(sizeof(int));
    *ret = 0;
    infile = open(arg,O_RDONLY);
    if (infile < 0)  {
        fprintf(stderr,"Error opening file %s: ", 
             (char *)arg);
        fprintf(stderr,"%s\n",strerror(errno));
        pthread_exit(ret);
    }
    strcpy(outfilename,"Copy_Of_");
    strcat(outfilename,arg); 
    outfile = open(outfilename,O_WRONLY | O_CREAT | O_TRUNC, 
             S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    if (outfile < 0) {
        fprintf(stderr,"Error opening file %s",outfilename);
        fprintf(stderr,"%s\n",strerror(errno));
        pthread_exit(ret);
    }
    while (1) {
         bytes_read = read(infile, buffer, BUFSIZE);
         if (bytes_read == 0) 
            break;
	  else if (bytes_read < 0) {
		  perror("reading");
                  pthread_exit(ret);
	  }
	  bytes_written = write(outfile, buffer, bytes_read);
          if (bytes_written != bytes_read) {
	         perror("writing");
                 pthread_exit(ret);
	  }
          *ret += bytes_written;
    }
    close(infile);
    close(outfile);
    pthread_exit(ret);
    return ret; /* we never get here, but the compiler
                      likes to see this line */
}

int main(int argc, char *argv[])
{
    pthread_t tid[10]; /* max of 10 possible threads */

    int total_bytes_copied = 0;
    int *bytes_copied_p;
    int i;
    int ret;

    for (i=1;i < argc;i++) {
      ret = pthread_create(&tid[i], NULL, copy_file, 
               (void *)argv[i]);
      if (ret != 0) {
           fprintf(stderr,"Could not create thread %d: %s\n",
            i, strerror(errno)); 
           fprintf(stderr,"ret is %d\n",ret);
      }
    }
    /* wait for copies to complete */

    for (i=1;i < argc;i++) {
      ret = pthread_join(tid[i],(void **)&(bytes_copied_p));
      if (ret != 0) {
         fprintf(stderr,"No thread %d to join: %s\n",
                 i, strerror(errno));
      }
      else {
         printf("Thread %d copied %d bytes from %s\n",
               i, *bytes_copied_p, argv[i]);
         total_bytes_copied += *bytes_copied_p;
      }
    }
    printf("Total bytes copied = %d\n",total_bytes_copied);
    return 0;
}
Pay particular attention to the way that pthread_exit passes a value, in this case an integer, back to the main thread; and note how the main thread uses the second argument of pthread_join to access this value. You may find this syntax (void **)&(bytes_copied_p) a little unsettling. The variable bytes_copied_p is a pointer to an integer. By preceding it with an amersand (&) we are passing its address to the function. This will permit the function to change what it is pointing to, which is a good thing because when you call the function, it is not pointing anywhere. The function changes its value so that it is pointing to the argument of thread_exit. The (void **) casts this value to a pointer to a pointer, which keeps the compiler happy.

The typical design of a program that uses threads is to have a main thread that creates some number of child threads and then waits for them to terminate. Both of the examples above used this design. However, any thread can create new threads, and it is not necessary for the parent thread to wait for the children to terminate.


Thought question The thread routine has only one argument. If your thread routine needed to accept multiple arguments, how would you do this?

Win32 APIs for Threads

The Win32 API to create a new thread is

HANDLE CreateThread(
  LPSECURITY_ATTRIBUTES lpThreadAttributes,  // pointer to security attributes
  DWORD dwStackSize,                         // initial thread stack size
  LPTHREAD_START_ROUTINE lpStartAddress,     // pointer to thread function
  LPVOID lpParameter,                        // argument for new thread
  DWORD dwCreationFlags,                     // creation flags
  LPDWORD lpThreadId                         // pointer to receive thread ID
);

Although this takes a few more arguments than pthread_create it works in a very similar way. If the first argument lpThreadAttributes is set to NULL, and the second argument dwStackSize is set to zero, appropriate default values will be assigned. The third argument lpStartAddress should be set to the name of the function to be called. The fourth argument lpParameter is a pointer to the argument to be passed to the function. The fifth argument dwCreationFlags should be set to zero. The last argument is a pointer to a DWORD. This will be set by the function.

The function returns a HANDLE value if successul. If it fails, the return value will be NULL.

The called function must have the following signature:
DWORD WINAPI ThreadProc(LPVOID lpParameter); replacing ThreadProc with the name of the function.

The Win32 equivalent of pthread_join is

DWORD WaitForSingleObject(
  HANDLE hHandle,        // handle to object to wait for
  DWORD dwMilliseconds   // time-out interval in milliseconds
);
This is a very important function because it is used to wait for all kinds of different events, and we will see it a lot in this course. In this case it is used to wait for a thread to terminate. The first argument hHandle is the handle returned from CreateThread. Like pthread_join, this function blocks until the thread terminates. However, unlike pthread_join this function allows you to specify how long you are willing to wait for the event before becoming unblocked. The second argument is the number of milliseconds to wait. if this value is zero, the function returns immediately, even if the thread has not terminated. Another possible value is the keyword INFINITE, which causes the function to block indefinitely if the thread does not terminate.

Here is why this may be useful. Supposed there is an infinite loop or other code in the thread function which means that the thread will never terminate. If the second argument of WaitForSingleObject was set to INFINITE, this would mean that the main thread would be indefinitely blocked. On the other hand, if the second argument was set to, say, 1000, then it would wake up after a one second interval.

Here is a simple sample program which corresponds to the first sample program using pthreads.

#include <windows.h>
#include <stdio.h>

DWORD WINAPI ThreadRoutine(LPVOID lpArg)
{
    int a;
	a = *(int *)lpArg;
	fprintf(stderr,"My argument is %d\n",a);
	return NULL;
}

int main()
{
	int i;
	int *lpArgPtr;
	HANDLE hHandles[5];
	DWORD ThreadId;
	
	for (i=0;i < 5;i++) {
		lpArgPtr = (int *)malloc(sizeof(int));
		*lpArgPtr = i;
		hHandles[i] = CreateThread(NULL,0,ThreadRoutine,lpArgPtr,0,&ThreadId);
		if (hHandles[i] == NULL) {
			fprintf(stderr,"Could not create Thread\n");
			exit(0);
		}
		else printf("Thread %d was created\n",ThreadId);
	}

	for (i=0;i < 5;i++) {
		WaitForSingleObject(hHandles[i],INFINITE);
	}
	return 0;
}

To terminate a particular thread without terminating the entire process, use the API
VOID ExitThread(DWORD ExitCode);
where ExitCode is a value to be returned to another thread.

A thread can read the ExitCode of another thread which has terminated with the API
BOOL GetExitCodeThread(HANDLE hThread, LPDWORD lpdwExitCode);
This should only be called if the thread has terminated, so it is usually used in combination with the API
WaitForSingleObject

Here are links to the on line help pages for these functions
CreateThread
WaitForSingleObject
ExitThread
GetExitCodeThread

Process and Thread Synchronization

Probably all of the programming that you have done is single thread programming, in which you can theoretically step through the entire program with a particular set of data and the result is determinate. The notion of several processes or several threads running concurrently introduces some new complexities if the processes or threads share common variables or other resources. Now, the results are not necessarily determinate; different runs of the same program with the same data can get different results if you are not careful.

Here is a very simple example. Imagine a process with several threads running concurrently in parallel. Each thread is counting the occurrence of some event, and whenever such an event occurs, the thread increments a global variable called COUNT. One of the key features of concurrent threads is that they can run in any order; you cannot make any assumptions about timing. In particular, it is possible for one thread to be preempted at any time and another thread starts running for a while.

Assume that the assembler program to increment the global variable COUNT looks like this (this is a pseudocode assembly language; it does not correspond directly to any particular instruction set. Everything after the semicolon is a comment; COUNT is the symbolic name of an address in memory)

LOAD R1,COUNT  ; copy the value of COUNT to Register 1
ADD R1,1       ; add 1 to the value of Register 1
STORE R1,COUNT ; copy the value of Register 1 to COUNT
This is not very complicated code. However, since an interrupt can occur between any two instructions, it is possible to get the wrong answer. Here's how.

Suppose the global variable COUNT has the value 17. Thread One detects a countable event and starts to execute this code. It has executed the LOAD instruction, loading the value 17 to Register 1. Before it can execute the ADD instruction, an interrupt occurs which causes Thread One to be moved from the Running State to the Ready state. The first thing that interrupt handlers always do is to copy the values of all of the registers so that they can be restored to their values when the thread starts running again.

Thread Two starts to run, and it detects a countable event so it starts executing this code. It loads the value 17 into Register 1, adds one to it, and then stores the value 18 in COUNT. Thread Two is eventually preempted, and Thread One starts running again. The first thing that it will do is restore the registers to their values at the time that it was preempted, so the value 17 is copied to Register 1. The thread then executes the ADD instruction, adding 1 to Register 1, and then it stores this value, 18, to the COUNT memory location.

Notice that two events occurred, but only one of them was counted.

Here is a formalism for concurrent programming. Two (or more) processes (threads) are running concurrently. We have to assume that they are running asynchronously; that is, we cannot make any assumptions about timing. The instructions in a particular thread will occur in the prescribed order, but we cannot assume that an instruction in one thread ever occurs before a given instruction in any other thread. In other words, any possible interleaving of instructions is possible. An interrupt can occur between any two instructions, and this interrupt can result in another process taking over the CPU. We need to prove that the program will be correct with all possible interleavings of instructions.

We can assume that any one instruction will not be interrupted in the middle. An instruction is called an atomic instruction if it cannot be interrupted. The three instructions in our simple assembly code above, LOAD, ADD, and STORE, are atomic instructions. There is no way that an interrupt can occur during the middle of the ADD instruction for example. However, we have to assume that an interrupt could occur between any two atomic instructions. If our instruction set had an atomic instruction which could increment a memory address without being interrupted, that would solve our problem. However, most architectures do not have such an instruction.

Concurrent programming makes debugging far more difficult. First, several different runs of a concurrent program with identical data can still produce different results because they could have run with different timing (interleaving). This means that when some kind of error occurs, it might not be possible to replicate the error. It is also likely that if a program has some kind of bug which results from concurrency, most of the time, the program will run correctly; but on rare occasion, it will produce the wrong answer.

Critical Sections

It should be obvious that concurrency is only a problem if two or more processes share common resources. If they run completely independently, problems will not arise.

A section of code that updates a resource which is shared by several concurrent processes or threads is called a critical section or critical region. As we have seen, it is important to make sure that only one process is in its critical section at any time. With the above code, if it could be guaranteed that every process would complete the update of the shared variable without risk of being interrupted, then the program would always produce the correct answer.

We can think of concurrent processes running indefinitely; they are usually running in their NonCriticalSection but occasionally they need to enter their critical section. Here is some pseudocode.

/* Process One */              /* Process Two */
while (TRUE) {                 while (TRUE) {
    NonCriticalSection();          NonCriticalSection();
    CriticalSection();             CriticalSection();
}                              }
This formalism is extremely general; it arguably describes all processes. It simply says that a process runs in its noncriticalsection (i.e. where it does not update any shared variables or otherwise use potentially shared resources) for an unspecified period of time, and periodically needs to enter its critical section, after which it goes back to executing in its noncriticalsection.

I have shown the code for only two processes, but we can assume that there are a large number of similar processes running concurrently and asynchronously.

Here are the rules:

To implement this, we have to add a PreProtocol before the start of the Critical Section and a PostPrococol at the end of the Critical Section to insure that these two conditions are met.

/* Process One */              /* Process Two */
while (TRUE) {                 while (TRUE) {
    NonCriticalSection();          NonCriticalSection();
    PreProtocol();                 PreProtocol();
    CriticalSection();             CriticalSection();
    PostProtocol();                PostProtocol();      
}                              }
A simple but wrong solution

One simple and obvious (but wrong) way to implement this is to have a global variable bool Occupied which is initialized to false. When a process wants to enter its critical section, it checks this variable. If its value is true ( meaning that some other process is in its critical section), it waits until it is set to false, then sets it to true and enters its critical section. When it leaves its critical section, it sets Occupied back to false.

/* A simple but wrong solution */
bool Occupied = false;
/* Process One */                 /* Process Two */
while (TRUE) {                    while (TRUE) {
    NonCriticalSection();             NonCriticalSection();
    while (Occupied == true)          while (Occupied == true)     
        { /* do nothing */ }             { /* do nothing */ }
    Occupied = true;                  Occupied = true;
    CriticalSection();                CriticalSection();
    Occupied = false;                 Occupied = false;
}                              }
This certainly looks like it should work. When a program wants to enter its critical section, it first checks to see if any other process is in its critical section (Occupied == true), and if some other process is in its critical section, it waits for the other process to leave.

Why doesn't this work? Recall that the solution has to be correct with all possible interleavings of statements. Suppose Process One wants to enter its Critical Section. It checks to see that Occupied is false and indeed it is. Oops! it gets interrupted and Process Two starts running. Process Two also wants to enter its critical section, so it checks that Occupied is false, and it is, so it sets Occupied to true and enters its critical section. While it is in its critical section it is interrupted and Process One starts running again, starting where it left off. Process One sets Occupied to true and enters its critical section. This violates the mutual exclusion principle; both processes are in their critical section at the same time.

To solve this problem, the instruction sets of many architectures provide an instruction which tests a variable and sets its value in a single operation. Such an instruction is called TSL (Test and Set Lock). It reads the value of a variable into a register and stores a non-zero value in that memory address. The key is that this is an atomic instruction; that is, the instruction cannot be interrupted in the middle, even though it is doing two things.

If the CPU has such an instruction, then the above algorithms works. You can test occupied and set its value in this single operation.

Dekker's Algorithm

To insure that our two conditions are met, we need a somewhat more complicated and less intuitive algorithm. The first correct solution to the mutual exclusion problem was Dekker's Algorithm. This requires three global variables. bool Need1, bool Need2 and int Turn. Need1 is set to true when Process One wants to enter its critical section, and Need2 is set to true when Process Two wants to enter its critical section. A process that wants to enter its critical section sets Turn to the value of the other process and then waits until one of two conditions is true. Either it is his turn, or the other process does not want to enter its critical section. When a process leaves its critical section, it sets its need variable to false. Here is the code.

/* Dekker's Algorithm */
bool Need1 = false, Need2 = false;
int turn = 1; /* the initial value of turn is arbitrary */
/* Process One */                     /* Process Two */
while (TRUE) {                           while (TRUE) {
    NonCriticalSection();                    NonCriticalSection(); 
    Need1 = true;                            Need2 = true          
    turn = 2;                                turn = 1;
    while (turn == 2 && Need2 == true)       while (turn == 1 && Need1 == true)
       {* do nothing */ }                        { /* do nothing */ }
    CriticalSection();                       CriticalSection();
    Need1 = false;                           Need2 = false;
}                                        }
Some examples will help to clarify this algorithm. First, let's see what would happen if Process One wants to enter its critical section and Process Two is in its NonCritical section. Process One sets Need1 to true and sets turn to 2. It can enter its critical section either when turn is 1 (which it is not) or when Need2 is false. The second condition is true so it can enter its critical section.

Now consider the case where Process One is in its Critical Section and Process Two wants to enter its critical section. Process Two sets Need2 to true and Turn to 1. It can enter its critical section only when Turn is 2 or when Need1 is false. Neither of these conditions is true, so Process Two waits. Eventually, Process One leaves its critical section and sets Need1 to false. This allows Process Two to enter its critical section.

Suppose both Processes would like to enter their critical section at the same time. We can look at what happens if interrupts happen at various places.

Process One sets Need1 to true, and then gets interrupted before it can set Turn. Process Two sets Need2 to true, and then sets Turn to 1. Because Turn is 1 and Need1 is true, Process Two is blocked. Eventually Process One runs again and starts where it left off. It sets Turn to 2. Since Need2 is true and Turn is 2, Process One is blocked. Eventually Process Two starts running again. Now Turnis 2, so it can enter its critical section. Process Two eventually finishes its Critical section and sets Need2 to false, so when Process One starts running again, it is no longer blocked.

While both Processes spend some time in the blocked state, the two conditions of mutual exclusion and no starvation are satisfied.

Now let's do a hard one. Let's assume that both processes want to enter their critical section at the same time, and they alternate instructions. This is unlikely to happen in real life, but we have to convince ourselves that Dekker's Algorithm works under all possible interleavings of statements.

Process One sets Need1 to true
Process Two sets Need2 to true
Process One sets Turn to 2.
Process Two sets Turn to 1.
Process One checks the condition of the while statement. Since Turn is 1, it breaks out of the loop
Process Two checks the condition of the while statement. Since Turn is 1 and need1 is true, it is blocked, and spins in the while loop.
Process One enters its critical section.
Process Two is blocked until Process One leaves its critical section

Semaphores

Dekker's Algorithm and Peterson's Algorithm can both be implemented entirely within user space, and this is an advantage. However, they are not widely used because they suffer from a problem called busy waiting. When they are waiting for a resource to become freed, they do not give up the CPU, they simply sit and waste CPU cycles. The TSL instruction has the same problem.

The solution is to have a call to the kernel which checks the value of a variable, and if this variable is set to zero, meaning that some other process is in its critical section, the process is blocked and some other process is allowed to run. When the resource becomes available (i.e. when the variable is set to one, meaning that the other process has left its critical section), the blocked process is awakened, the variable is set to zero again, and the process enters its critical section.

Historically, such a variable is called a semaphore. The term semaphore is borrowed from railroading, where a semaphore is a signal indicating whether a particular section of track was free or not. You might notice the similarity between tracks being free or not and processes entering their critical sections.

Semaphores (the computer kind) were initially conceived by Dijkstra, who was Dutch. He labeled the two operations on a semaphore P for probern, the Dutch word for test, and and V for verhogen (to increment). These are now often called Down and Up.

The Down operation checks to see if the value of the semaphore is greater than zero. If it is, it decrements the value and the process enters its critical section. If the value of the semaphore is zero, the calling process is blocked. The Up operations increments the value of the semaphore and signals any process which is waiting for the semaphore to increment.

A key feature of semaphores is that the Down and Up operations must run in kernel space because they cannot be interrupted; they have to be atomic.

Here is pseudocode for the semaphore operation. All processes are identical, so only one is shown

Semaphore S=1;
while (true) {
    NonCriticalSection();
    Down(S);
    CriticalSection();
    Up(S);
}
So far, we have only discussed the situation with exactly two processes, where only one can be in its critical section at a time. Semaphores are easily generalizable to an arbitrary number of processes and to situations where the number of processes that can be in their critical section at the same time is greater than one, but still a fixed small number.

Suppose you have a case where there are several instances of a resource, three tape drives for example. At any time a process might need a tape drive, and if a tape drive is assigned to a particular process, no other process can use it. However, a process does not care which of the three tape drives it uses. In this case, the semaphore is initialized to three, and as long as its value is greater than zero, any process that needs a tape drive can get access to one. Because the Up call is always paired with and occurs after a Down call, the value of the semaphore can never be greater than three (or whatever its initial value was).

If there are many processes sharing a semaphore, there may be more than one process waiting for the semaphore to be incremented, and in such a case, the operating system needs a way to determine which of several waiting processes will be awakened and allowed to enter its critical section. The obvious method is a simple queue, using a First In First Out algorithm. However, other algorithms are possible based on process priorities.

Mutexes

A simple version of a semaphore with only two states is called a mutex which is short for mutual exclusion. A mutex is a simple lock primitive which can be used to guard a shared resource such as a critical section to ensure that only one process or thread at a time is using the resource. A mutex only has two states, locked and unlocked. Like a semaphore, a process which tries to access its critical section but finds that the mutex is locked is typically blocked so that it does not use CPU cycles. When the mutex is unlocked, the blocked process or thread is awakened and allowed to enter its critical section after the mutex is relocked.

mutex M;
while (true)
{
     NonCriticalSection();
     M.lock();
     CriticalSection();
     M.unlock();
}

Monitors

A monitor is a synchronization abstraction; it is like a class which enforces mutual exclusion; in particular it enforces the fact that only one process can be executing a particular procedure at a particular time. As with any abstract data type, other processes must rely on the defined interface, they cannot manipulate data inside the monitor directly.

Here is an example (from Nutt, Operating Systems, a modern perspective) A north-south two way road has a one-way tunnel. A southbound (or northbound) car which arrives at the tunnel can enter immediately if there are no cars in the tunnel going the opposite way. Traffic is controlled with a traffic light.

The monitor has three functions, northboundArrival() which is called when a northbound car is approaching the tunnel, southboundArrival(), which is called when a southbound car approaches the tunnel, and depart which is called when a car leaves the tunnel. All three of these are triggered by sensors in the road.

Here is the code for the monitor

monitor tunnel {
    int northbound = 0, southbound = 0;
    traffic-signal northbound_signal = RED.
                   southbound_signal = RED;
    condition busy;
public:
    northboundArrival() {
        if (southbound > 0) busy.wait;
        northbound++;
        northbound_signal = GREEN;
        southbound_signal = RED;
    };
    southboundArrival() {
        if (northbound > 0) busy.wait;
        southbound++;
        southbound_signal = GREEN;
        northbound_signal = RED;
    };
    depart(Direction exit) {
        if (exit == north) {
            northbound--;
            if (northbound == 0)
                while(busy.queue) busy.signal;
        }
        else if (exit == south) {
             southbound--;
             if (southbound == 0) 
                 while(busy.queue) busy.signal;
        }
   };
}

No system has a primitive called a monitor; monitors are merely abstractions. Monitors can always be implemented using semaphores.