CSCI.4210 Operating Systems Fall, 2009, Class 11

CSCI.4210 Operating Systems Fall, 2009 Class 11
More Unix Stuff

IO System Calls

You have used lots of I/O library operations, cin and cout among others in C++, and printf, scanf, fgets, etc. for C. In Unix, all of these use one of three actual system calls, open(),read() and write().

The open() system call opens a file. Here is the prototype.

       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>

       int open(const char *pathname, int flags);
       int open(const char *pathname, int flags, mode_t mode);

Every running process has a table of open file descriptors. A successful call to open will returns the lowest unused entry in the file descriptor table (Note that this may be zero, if you have closed stdin.)

The first argument (pathname) is obvious. The flags argument has a number of bits which are set with symbolic names. Exactly one of these three must be set.

O_RDONLY Open for reading only
O_WRONLY Open for writing only
O_RDWR Open for both reading and writing.

In addition, other bits may be set by using bitwise OR.
O_APPEND (append rather than overwrite)
O_CREAT (create the file if it doesn't exist)
O_EXCL generates an error if O_CREAT is specified and the file already exists
O_TRUNC if file exists, and opened for writing, truncate to zero

The optional third argument, mode, can be used to set permission modes. If the file is to be created, you must set some permission flags.

Each file has nine permission bits. Symbolic names are defined in sys/stat.h.

S_IRUSR owner read
S_IWUSR owner write
S_IXUSR owner execute
S_IRGRP group read
S_IWGRP group write
S_IXGRP group execute
S_IROTH other read
S_IWOTH other write
S_IXOTH other execute

Every process has a default file creation mask which is a bitmask of permissions that should not be granted to newly created files.

Here is a link to the complete man page for open

Here is the signature for the read system call.

 #include <unistd.h>

       ssize_t read(int fd, void *buf, size_t count);

The first argument is a file descriptor, which must have been previously opened with a an open system call (or refer to one of the three default file descriptors which are automatically opened (stdin, stdout, and stderr)). The second argument is a character buffer. The third argument is the maximum number of bytes to be read (the type size_t is an unsigned (i.e. always positive) int). This should always be less than or equal to the size of buf (but remember that you can't use sizeof or strlen to get this).

A call to read will return a negative value on an error, a zero at end-of-file, or a positive integer, which indicates the number of bytes copied into buf. This is guaranteed to be equal to or less than count. The type ssize_t is an int.

The write system call writes data to a file. Here is the signature.

 #include <unistd.h>

       ssize_t write(int fd, const void *buf, size_t count);

This will write count bytes from buf to the file indicated by the file descriptor fd. This will return either a negative value on error, or the number of bytes written. This should always be count.

If a file is open for writing, and it already has data, the data is not trashed, but it is overwritten.

fd = open("Somefile",O_WRONLY)
r = write(fd,"Here is some data",17);

The first 17 bytes are overwritten, the rest stays.

Each open file has associated with it a "current file offset" which points to the next byte to be read or written. This is initialized to 0 unless the O_APPEND flag is set, in which case it is initialized to the end of the file at the time of the first write.

This can be changed with the lseek function

off_t lseek(int filedes, off_t offset, int whence)

whence can be:
SEEK_SET (0) which means offset from the start of the file
SEEK_CUR (1) which means offset from the current file offset
SEEK_END (2) which means offset from the end of the file

These are defined in unistd.h

Offset can be either positive or negative.

A successful call to lseek returns the current offset, so we can find the current offset with this call

curpos = lseek (fd, 0, SEEK_CUR)

we can rewind the file with lseek(fd,0,SEEK_SET)

Note that it is possible to create a file with a hole in it.

The system call int close(int fd) closes a file. The file descriptor table is of finite size, so if you have opened lots of files, you should close those that are no longer needed. When a process terminates, all open file descriptors are closed.

Implementation

The kernel uses three data structures to accomplish this.

Every process has an entry in the process table, and each entry has a table of open file descriptors. Each open file descriptor has
file descriptor flags
a pointer to the file entry table

The kernel maintains a file table for all open files. Each entry contains:
the file status flags for the file (read, write, append, sync etc)
the current file offset
a pointer to the v-node table entry (information about the file)

It is possible for two (or more) processes to have the same file open, and it is possible for two separate file descriptors in the same process to point to the same file. The result is a possible race condition with unpredictable results if two processes write to the same file. (But note that if both have the O_APPEND flag set, no data will be lost).

The processes can override this by just closing the file descriptors that they don't need.

The file descriptor table is maintained across a fork and an exec. This means that both parent and child have the same open files. What happens to the file descriptor table across a fork()? It is passed on. and the two processes share a common file table, include the current file offset, so if either reads, the file offset is updated for both.

Here is a program which does the following
opens the file alphabet (which contains the 26 upper case letters in order)
forks
parent reads 5 bytes, waits, reads 5 bytes
child sleeps for 2 seconds, reads 5 bytes, terminates

#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
int main()
{
     int fd, pid, r;
     char buffer[32];
   
     if ((fd=open("Alphabet",O_RDONLY)) < 0) {
         perror("error opening Alphabet");
         exit(0);
     }
     if ((pid=fork()) < 0) {
         perror("error on fork");
         exit(0);
     }
     if (pid==0) {  /* child */
         sleep(2);
         r=read(fd,buffer,5);
         write(STDOUT_FILENO,"Child reads ",12);
         write(STDOUT_FILENO,buffer,5);
         write(STDOUT_FILENO,"\n",1);
         exit(0);
     }
     else { /* parent */
         r=read(fd,buffer,5);
         write(STDOUT_FILENO,"Parent reads ",13);
         write(STDOUT_FILENO,buffer,5);
         write(STDOUT_FILENO,"\n",1);
         wait();
         r=read(fd,buffer,5);
         write(STDOUT_FILENO,"Parent reads ",13);
         write(STDOUT_FILENO,buffer,5);
         write(STDOUT_FILENO,"\n",1);
     }
     return 0;
}

Parent reads ABCDE
Child reads FGHIJ
Parent reads KLMNO

Pipes

A pipe is a connection between two processes in which one process writes data to the pipe and the other reads from the pipe. Thus, it allows one process to pass data to another process.

The Unix system call to create a pipe is
int pipe(int fd[2])
This function takes an array of two ints (file descriptors) as an argument. It creates a pipe with fd[0] at one end and fd[1] at the other. Reading from the pipe and writing to the pipe are done with the read and write calls. Although both ends are opened for both reading and writing, by convention a process writes to fd[1] and reads from fd[0]. Pipes only make sense if the process calls fork after creating the pipe. Each process should close the end of the pipe that it is not using. Here is a simple example in which a child sends a message to its parent through a pipe.

#include <unistd.h>
#include <stdio.h>

int main()
{
  pid_t pid;
  int retval;
  int fd[2];
  int n;

  retval = pipe(fd);
  if (retval < 0) {
    printf("Pipe failed\n"); /* pipe is unlikely to fail */
    exit(0);
  }

  pid = fork();
  if (pid == 0) { /* child */
    close(fd[0]);
    n = write (fd[1],"Hello from the child",20);
    exit(0);
  }
  else if (pid > 0) { /* parent */
    char buffer[64];
    close(fd[1]);
    n = read(fd[0],buffer,64);
    buffer[n]='\0';
    printf("I got your message: %s\n",buffer);
  }
  return 0;
}

There is no need for the parent to wait for the child to finish because reading from a pipe will block until there is something in the pipe to read. If the parent runs first, it will try to execute the read statement, and will immediately block because there is nothing in the pipe. After the child writes a message to the pipe, the parent will wake up.

Pipes have a fixed size (often 4096 bytes) and if a process tries to write to a pipe which is full, the write will block until a process reads some data from the pipe.

Here is a program which combines dup2 and pipe to redirect the output of the ls process to the input of the more process as would be the case if the user typed
ls | more
at the Unix command line.

#include <stdio.h>
#include <unistd.h>

void error(char *msg)
{
     perror(msg);
     exit(1);
}

int main()
{
    int p[2], retval;
    retval = pipe(p);
    if (retval < 0) error("pipe");
    retval=fork();
    if (retval < 0) error("forking");
    if (retval==0) { /* child */
          dup2(p[1],1); /* redirect stdout to pipe */
          close(p[0]);  /* don't permit this 
                process to read from pipe */
          execl("/bin/ls","ls","-l",NULL);
          error("Exec of ls");
       }
    /* if we get here, we are the parent */ 
     dup2(p[0],0);  /* redirect stdin to pipe */
     close(p[1]);  /* don't permit this 
                  process to write to pipe */
     execl("/bin/more","more",NULL);
     error("Exec of more");
     return 0;
}

Signals

Recall that an interrupt is an asynchronous event which can happen at any time. When an interrupt occurs, the processor stops executing instructions in the current running process and executes an interrupt handler function in the kernel. Unix systems have a software interrupt mechanism called signals.

An example of a signal that you are probably familiar with is an interrupt signal which is sent by the user to a running process when the user enters Control-C. The default action of this signal is to kill the process.

A signal is represented as an integer. These integers are assigned symbolic names in the header file signal.h. The interrupt signal has the value 2 but you should use the symbolic name SIGINT.

Every signal has a default action. The default action for SIGINT is to abort the program. A program can modify the default action for most signals or they can choose to ignore a signal.

The system call which does this has the following function prototype.

void (*signal (int sig, void (*disp)(int)))(int);

This says that the function signal takes two arguments, the first, sig is a signal, and the second is function name. This function takes one argument, an integer and returns a pointer. The call to signal changes the signal handling function for its first argument from the default to the function of its second argument.

Here is a simple example.

#include <signal.h>
#include <stdio.h>
void *SigCatcher(int n)
{
    printf("Ha Ha, you can't kill me\n");
    signal(SIGINT,(void (*))SigCatcher);   
    return (void *)NULL;
}

int main()
{
    int i;

    signal(SIGINT,(void (*))SigCatcher);
    for (i=0;i<10;i++) {
        sleep(1);
        printf("Just woke up, i is %d\n",i);
    }
    return 0;
}

The main function calls signal to change the default action to the function SigCatcher then enters a loop where it alternately sleeps for one second, then displays a message on stdout. Normally, the user could kill this program by hitting Control-C while it was running, but because the default signal action has changed, when the user hits Control-C while this program is running, instead of the program dying, it displays the message
Ha Ha, you can't kill me
Try it.

Notice that the signal handler function calls signal. On some Unix systems, once a signal handler has been called, the system resets the handler to the default unless it is reset again.

Here is a list of the predefined signals on Solaris (there are some slight differences from one Unix system to another).

#define	SIGHUP	1	/* hangup */
#define	SIGINT	2	/* interrupt (rubout) */
#define	SIGQUIT	3	/* quit (ASCII FS) */
#define	SIGILL	4	/* illegal instruction (not reset when caught) */
#define	SIGTRAP	5	/* trace trap (not reset when caught) */
#define	SIGIOT	6	/* IOT instruction */
#define	SIGABRT 6	/* used by abort, replace SIGIOT in the future */
#define	SIGEMT	7	/* EMT instruction */
#define	SIGFPE	8	/* floating point exception */
#define	SIGKILL	9	/* kill (cannot be caught or ignored) */
#define	SIGBUS	10	/* bus error */
#define	SIGSEGV	11	/* segmentation violation */
#define	SIGSYS	12	/* bad argument to system call */
#define	SIGPIPE	13	/* write on a pipe with no one to read it */
#define	SIGALRM	14	/* alarm clock */
#define	SIGTERM	15	/* software termination signal from kill */
#define	SIGUSR1	16	/* user defined signal 1 */
#define	SIGUSR2	17	/* user defined signal 2 */
#define	SIGCLD	18	/* child status change */
#define	SIGCHLD	18	/* child status change alias (POSIX) */
#define	SIGPWR	19	/* power-fail restart */
#define	SIGWINCH 20	/* window size change */
#define	SIGURG	21	/* urgent socket condition */
#define	SIGPOLL 22	/* pollable event occured */
#define	SIGIO	SIGPOLL	/* socket I/O possible (SIGPOLL alias) */
#define	SIGSTOP 23	/* stop (cannot be caught or ignored) */
#define	SIGTSTP 24	/* user stop requested from tty */
#define	SIGCONT 25	/* stopped process has been continued */
#define	SIGTTIN 26	/* background tty read attempted */
#define	SIGTTOU 27	/* background tty write attempted */
#define	SIGVTALRM 28	/* virtual timer expired */
#define	SIGPROF 29	/* profiling timer expired */
#define	SIGXCPU 30	/* exceeded cpu limit */
#define	SIGXFSZ 31	/* exceeded file size limit */
#define	SIGWAITING 32	/* process's lwps are blocked */
#define	SIGLWP	33	/* special signal used by thread library */
#define	SIGFREEZE 34	/* special signal used by CPR */
#define	SIGTHAW 35	/* special signal used by CPR */
#define	SIGCANCEL 36	/* thread cancellation signal used by libthread */
#define	SIGLOST	37	/* resource lost (eg, record-lock lost) */

Signal 11, SIGSEGV is the signal that is received when the program detects a segmentation fault (memory exception error). The default action for this is to display the message
Segmentation Fault (core dumped)
dump the core, and terminate the program.

You can change the action for this so that it displays a different message, but of course you cannot try to continue to run the program.

Signal 9, SIGKILL, is the kill signal. A program is not allowed to change the signal handler for this signal. Otherwise, it would be possible for a program to change all of its signal handlers so that no one could kill a rogue program. To send a kill signal from the shell to a particular process, enter
kill -9 ProcessNumber

Signal 14 SIGALRM sends an alarm to a process. The default SIGALRM handler is to abort the program, but this can be changed. The system call
unsigned int alarm(unsigned int sec);
sends a SIGALRM signal to the process after secseconds. If you have changed the signal handler function for this, then you can arrange for an event to happen after a set period of time.

You can choose to ignore any signal (except SIGKILL) by using SIG_IGN as the second argument of signal. You can also reset the signal handler for a particular signal to its default by using SIG_DFL as the second argument to signal.

Killing Zombies

Recall that if a child dies before its parent calls wait, the child becomes a zombie. In some applications, a web server for example, the parent forks off lots of children but doesn't care whether the child is dead or alive. For example, a web server might fork a new process to handle each connection, and each child dies when the client breaks the connection. Such an application is at risk of producing many zombies, and zombies can clog up the process table.

When a child dies, it sends a SIGCHLD signal to its parent. The parent process can prevent zombies from being created by creating a signal handler routine for SIGCHLD which calls wait whenever it receives a SIGCHLD signal. There is no danger that this will cause the parent to block because it would only call wait when it knows that a child has just died.

There are several versions of wait on a Unix system. The system call waitpid has this prototype

#include <sys/types.h>
#include <sys/wait.h>

pid_t waitpid(pid_t pid, int *stat_loc, int options)

This will function like wait in that it waits for a child to terminate, but this function allows the process to wait for a particular child by setting its first argument to the pid that we want to wait for. However, that is not our interest here. If the first argument is set to zero, it will wait for any child to terminate, just like wait. However, the third argument can be set to WNOHANG. This will cause the function to return immediately if there are no dead children. It is customary to use this function rather than wait in the signal handler.

Here is some sample code

#include <sys/types.h>
#include <stdio.h>
#include <signal.h>
#include <wait.h>
#include <unistd.h>

void *zombiekiller(int n)
{
  int status;
  waitpid(0,&status,WNOHANG);
  signal(SIGCHLD,zombiekiller);  
  return (void *) NULL;
}
int main()
{
  signal(SIGCHLD, zombiekiller);
  ....
}

File Locking

It is often the case that several processes or threads would like to access the same file concurrently. As long as they are reading the file, this is not a problem, but if several processes are trying to modify the file at the same time, changes can be missed.

Consider the simple and common problem where a user opens a file in a text editor, makes some changes to the file, and then saves the file. If a second process opens the same file for editing, makes some different changes, and then saves the file, one set of changes will be lost depending on the order in which the two processes saved the file.

Operating systems have a mechanism of file locking to prevent this occurrence.

Unix

Unix has two system calls for locking files. The earlier one is flock

     #include <sys/file.h>
     int flock( int fd,  int operation);

The first argument is a file descriptor returned from a call to open, the second is one of the following symbolic names.

LOCK_SH  /* shared lock */
LOCK_EX  /* exclusive lock */
LOCK_UN  /* Unlock a file */
LOCK_NB  /* Nonblocking lock */

A process that wants to write to a potentially shared file first calls flock with LOCK_EX as its second argument. If another process or thread has already locked the file with flock this process will block until the other process unlocks the file. To unlock a file, use LOCK_UN as the second argument.

This file lock is an advisory lock. This means that a rogue process can ignore the lock and write to a locked file. However, if all of the processes which could potentially modify a file use this mechanism, it can prevent data loss as a result of contention.

The flock call locks the whole file, and this can produce serious performance problems if there are many processes trying to update a database concurrently. The newer Unix file locking call is lockf. This locks only a portion of the file, a single record in a database for example, rather than the entire file, so that many concurrent processes can update a database at the same time as long that they are not trying to update the same record.

Here is the function prototype

     #include <unistd.h>
     int lockf(int fildes, int function, off_t size);

The first argument is the file descriptor, the second argument is one of the following:

#define   F_ULOCK   0   /* unlock previously locked section */
#define   F_LOCK    1   /* lock section for exclusive use */
#define   F_TLOCK   2   /* test & lock section for exclusive use */
#define   F_TEST    3   /* test section for other locks */

The two of these that you are most likely to need are F_LOCK to lock the file and F_ULOCK to unlock it.

The third argument is the number of bytes to lock. These start at the current file offset, i.e. the next place in the file where a read or write will occur.

This is also an advisory rather than a mandatory lock. Both of these will put the calling process or thread to sleep if the file or section of the file is blocked, and awaken the process or thread when the record is unlocked.

More Windows APIs

The API to open a file in windows is the inappropriately named CreateFile.

HANDLE CreateFile(
  LPCTSTR lpFileName,          // pointer to name of the file
  DWORD dwAccess,              // access (read-write) mode
  DWORD dwShareMode,           // share mode
  LPSECURITY_ATTRIBUTES lpSecurityAttributes,
                               // pointer to security attributes
  DWORD dwCreate,              // how to create
  DWORD dwFlagsAndAttributes,  // file attributes
  HANDLE hTemplateFile         // handle to file with attributes to 
                               // copy
);

dwAccess can be either GENERIC_READ or GENERIC_WRITE or both.

dwShareMode can be

0 (no sharing). This is not advisory, this is enforced by the OS.
FILE_SHARE_READ (Other processes can read but not write)
FILE_SHARE_WRITE (Other processes can read and write)

dwCreate can have these values

CREATE_NEW - fails if the file already exists
CREATE_ALWAYS - an existing file will be overwritten
OPEN_EXISTING - fails if the file does not exist
OPEN_ALWAYS -Open the file, creating it if it does not exist
TRUNCATE_EXISTING - File must be open for writing

To close a file, use the generic API BOOL CloseHandle(HANDLE hObject)

Here is the APIs for reading a file.

BOOL ReadFile(
  HANDLE hFile,                // handle of file to read
  LPVOID lpBuffer,             // pointer to buffer that receives data
  DWORD nNumberOfBytesToRead,  // number of bytes to read
  LPDWORD lpNumberOfBytesRead, // pointer to number of bytes read
  LPOVERLAPPED lpOverlapped    // pointer to structure for data
);

This works exactly like the Unix read function except that the number of bytes read is in the fourth argument. The last argument should be NULL.

Here is the API for writing to a file.

BOOL WriteFile(
  HANDLE hFile,                    // handle to file to write to
  LPCVOID lpBuffer,                // pointer to data to write to file
  DWORD nNumberOfBytesToWrite,     // number of bytes to write
  LPDWORD lpNumberOfBytesWritten,  // pointer to number of bytes written
  LPOVERLAPPED lpOverlapped        // pointer to structure for overlapped I/O
);

Return to the course home page