You have used lots of I/O library operations, cin and cout among others in C++, and printf, scanf, fgets, etc. for C. In Unix, all of these use one of three actual system calls, open(),read() and write().
The open() system call opens a file. Here is the prototype.
#include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode);Every running process has a table of open file descriptors. A successful call to open will returns the lowest unused entry in the file descriptor table (Note that this may be zero, if you have closed stdin.)
The first argument (pathname) is obvious. The flags argument has a number of bits which are set with symbolic names. Exactly one of these three must be set.
O_RDONLY Open for reading only
O_WRONLY Open for writing only
O_RDWR Open for both reading and writing.
In addition, other bits may be set by using bitwise OR.
O_APPEND (append rather than overwrite)
O_CREAT (create the file if it doesn't exist)
O_EXCL generates an error if O_CREAT is specified and the file already
exists
O_TRUNC if file exists, and opened for writing, truncate to zero
The optional third argument, mode, can be used to set permission modes. If the file is to be created, you must set some permission flags.
Each file has nine permission bits. Symbolic names are defined in sys/stat.h.
S_IRUSR owner read
S_IWUSR owner write
S_IXUSR owner execute
S_IRGRP group read
S_IWGRP group write
S_IXGRP group execute
S_IROTH other read
S_IWOTH other write
S_IXOTH other execute
Every process has a default file creation mask which is a bitmask of permissions that should not be granted to newly created files.
Here is a link to the complete man page for open
Here is the signature for the read system call.
#include <unistd.h> ssize_t read(int fd, void *buf, size_t count);The first argument is a file descriptor, which must have been previously opened with a an open system call (or refer to one of the three default file descriptors which are automatically opened (stdin, stdout, and stderr)). The second argument is a character buffer. The third argument is the maximum number of bytes to be read (the type size_t is an unsigned (i.e. always positive) int). This should always be less than or equal to the size of buf (but remember that you can't use sizeof or strlen to get this).
A call to read will return a negative value on an error, a zero at end-of-file, or a positive integer, which indicates the number of bytes copied into buf. This is guaranteed to be equal to or less than count. The type ssize_t is an int.
The write system call writes data to a file. Here is the signature.
#include <unistd.h> ssize_t write(int fd, const void *buf, size_t count);
This will write count bytes from buf to the file indicated by the file descriptor fd. This will return either a negative value on error, or the number of bytes written. This should always be count.
If a file is open for writing, and it already has data, the data is not trashed, but it is overwritten.
fd = open("Somefile",O_WRONLY) r = write(fd,"Here is some data",17);
The first 17 bytes are overwritten, the rest stays.
Each open file has associated with it a "current file offset" which points to the next byte to be read or written. This is initialized to 0 unless the O_APPEND flag is set, in which case it is initialized to the end of the file at the time of the first write.
This can be changed with the lseek function
off_t lseek(int filedes, off_t offset, int whence)
whence can be:
SEEK_SET (0) which means offset from the start of the file
SEEK_CUR (1) which means offset from the current file offset
SEEK_END (2) which means offset from the end of the file
These are defined in unistd.h
Offset can be either positive or negative.
A successful call to lseek returns the current offset, so we can find the current offset with this call
curpos = lseek (fd, 0, SEEK_CUR)
we can rewind the file with lseek(fd,0,SEEK_SET)
Note that it is possible to create a file with a hole in it.
The system call int close(int fd) closes a file. The file descriptor table is of finite size, so if you have opened lots of files, you should close those that are no longer needed. When a process terminates, all open file descriptors are closed.
The kernel uses three data structures to accomplish this.
Every process has an entry in the process table, and each entry has a table
of open file descriptors. Each open file descriptor has
file descriptor flags
a pointer to the file entry table
The kernel maintains a file table for all open files. Each entry contains:
the file status flags for the file (read, write, append, sync etc)
the current file offset
a pointer to the v-node table entry (information about the file)
It is possible for two (or more) processes to have the same file open, and it is possible for two separate file descriptors in the same process to point to the same file. The result is a possible race condition with unpredictable results if two processes write to the same file. (But note that if both have the O_APPEND flag set, no data will be lost).
The processes can override this by just closing the file descriptors that they don't need.
The file descriptor table is maintained across a fork and an exec. This means that both parent and child have the same open files. What happens to the file descriptor table across a fork()? It is passed on. and the two processes share a common file table, include the current file offset, so if either reads, the file offset is updated for both.
Here is a program which does the following
opens the file alphabet (which contains the 26 upper case letters in order)
forks
parent reads 5 bytes, waits, reads 5 bytes
child sleeps for 2 seconds, reads 5 bytes, terminates
#include <unistd.h> #include <errno.h> #include <fcntl.h> int main() { int fd, pid, r; char buffer[32]; if ((fd=open("Alphabet",O_RDONLY)) < 0) { perror("error opening Alphabet"); exit(0); } if ((pid=fork()) < 0) { perror("error on fork"); exit(0); } if (pid==0) { /* child */ sleep(2); r=read(fd,buffer,5); write(STDOUT_FILENO,"Child reads ",12); write(STDOUT_FILENO,buffer,5); write(STDOUT_FILENO,"\n",1); exit(0); } else { /* parent */ r=read(fd,buffer,5); write(STDOUT_FILENO,"Parent reads ",13); write(STDOUT_FILENO,buffer,5); write(STDOUT_FILENO,"\n",1); wait(); r=read(fd,buffer,5); write(STDOUT_FILENO,"Parent reads ",13); write(STDOUT_FILENO,buffer,5); write(STDOUT_FILENO,"\n",1); } return 0; } Parent reads ABCDE Child reads FGHIJ Parent reads KLMNOPipes
A pipe is a connection between two processes in which one process writes data to the pipe and the other reads from the pipe. Thus, it allows one process to pass data to another process.
The Unix system call to create a pipe is
int pipe(int fd[2])
This function takes an array of two ints (file descriptors)
as an argument. It creates a pipe with fd[0] at
one end and fd[1] at the other. Reading from the
pipe and writing to the pipe are done with the read and
write calls.
Although both
ends are opened for both reading and writing, by convention
a process writes to fd[1] and reads from fd[0].
Pipes only make sense if the process calls fork after creating
the pipe. Each process should close the end of the pipe that
it is not using. Here is a simple example in which a child
sends a message to its parent through a pipe.
#include <unistd.h> #include <stdio.h> int main() { pid_t pid; int retval; int fd[2]; int n; retval = pipe(fd); if (retval < 0) { printf("Pipe failed\n"); /* pipe is unlikely to fail */ exit(0); } pid = fork(); if (pid == 0) { /* child */ close(fd[0]); n = write (fd[1],"Hello from the child",20); exit(0); } else if (pid > 0) { /* parent */ char buffer[64]; close(fd[1]); n = read(fd[0],buffer,64); buffer[n]='\0'; printf("I got your message: %s\n",buffer); } return 0; }There is no need for the parent to wait for the child to finish because reading from a pipe will block until there is something in the pipe to read. If the parent runs first, it will try to execute the read statement, and will immediately block because there is nothing in the pipe. After the child writes a message to the pipe, the parent will wake up.
Pipes have a fixed size (often 4096 bytes) and if a process tries to write to a pipe which is full, the write will block until a process reads some data from the pipe.
Here is a program which combines dup2 and
pipe to redirect the output of the ls process
to the input of the more process as would be the case
if the user typed
ls | more
at the Unix command line.
#include <stdio.h> #include <unistd.h> void error(char *msg) { perror(msg); exit(1); } int main() { int p[2], retval; retval = pipe(p); if (retval < 0) error("pipe"); retval=fork(); if (retval < 0) error("forking"); if (retval==0) { /* child */ dup2(p[1],1); /* redirect stdout to pipe */ close(p[0]); /* don't permit this process to read from pipe */ execl("/bin/ls","ls","-l",NULL); error("Exec of ls"); } /* if we get here, we are the parent */ dup2(p[0],0); /* redirect stdin to pipe */ close(p[1]); /* don't permit this process to write to pipe */ execl("/bin/more","more",NULL); error("Exec of more"); return 0; }
Recall that an interrupt is an asynchronous event which can happen at any time. When an interrupt occurs, the processor stops executing instructions in the current running process and executes an interrupt handler function in the kernel. Unix systems have a software interrupt mechanism called signals.
An example of a signal that you are probably familiar with is an interrupt signal which is sent by the user to a running process when the user enters Control-C. The default action of this signal is to kill the process.
A signal is represented as an integer. These integers
are assigned symbolic names in the header file
signal.h
. The interrupt signal has the
value 2 but you should use the symbolic name SIGINT.
Every signal has a default action. The default action for SIGINT is to abort the program. A program can modify the default action for most signals or they can choose to ignore a signal.
The system call which does this has the following function prototype.
void (*signal (int sig, void (*disp)(int)))(int);
This says that the function signal takes two arguments,
the first, sig
is a signal, and the
second is function name. This function takes
one argument, an integer and returns a pointer.
The call to signal
changes the signal handling function for its
first argument from the default to the function of its
second argument.
Here is a simple example.
#include <signal.h> #include <stdio.h> void *SigCatcher(int n) { printf("Ha Ha, you can't kill me\n"); signal(SIGINT,(void (*))SigCatcher); return (void *)NULL; } int main() { int i; signal(SIGINT,(void (*))SigCatcher); for (i=0;i<10;i++) { sleep(1); printf("Just woke up, i is %d\n",i); } return 0; }The main function calls signal to change the default action to the function
SigCatcher
then enters a loop where it alternately sleeps for one second, then
displays a message on stdout. Normally, the user could
kill this program by hitting Control-C while it was
running, but because the default signal action has
changed, when the user hits Control-C while this program
is running, instead of the program dying, it displays the
messageHa Ha, you can't kill me
Notice that the signal handler function calls
signal
. On some Unix systems, once
a signal handler has been called, the system
resets the handler to the default unless it
is reset again.
Here is a list of the predefined signals on Solaris (there are some slight differences from one Unix system to another).
#define SIGHUP 1 /* hangup */ #define SIGINT 2 /* interrupt (rubout) */ #define SIGQUIT 3 /* quit (ASCII FS) */ #define SIGILL 4 /* illegal instruction (not reset when caught) */ #define SIGTRAP 5 /* trace trap (not reset when caught) */ #define SIGIOT 6 /* IOT instruction */ #define SIGABRT 6 /* used by abort, replace SIGIOT in the future */ #define SIGEMT 7 /* EMT instruction */ #define SIGFPE 8 /* floating point exception */ #define SIGKILL 9 /* kill (cannot be caught or ignored) */ #define SIGBUS 10 /* bus error */ #define SIGSEGV 11 /* segmentation violation */ #define SIGSYS 12 /* bad argument to system call */ #define SIGPIPE 13 /* write on a pipe with no one to read it */ #define SIGALRM 14 /* alarm clock */ #define SIGTERM 15 /* software termination signal from kill */ #define SIGUSR1 16 /* user defined signal 1 */ #define SIGUSR2 17 /* user defined signal 2 */ #define SIGCLD 18 /* child status change */ #define SIGCHLD 18 /* child status change alias (POSIX) */ #define SIGPWR 19 /* power-fail restart */ #define SIGWINCH 20 /* window size change */ #define SIGURG 21 /* urgent socket condition */ #define SIGPOLL 22 /* pollable event occured */ #define SIGIO SIGPOLL /* socket I/O possible (SIGPOLL alias) */ #define SIGSTOP 23 /* stop (cannot be caught or ignored) */ #define SIGTSTP 24 /* user stop requested from tty */ #define SIGCONT 25 /* stopped process has been continued */ #define SIGTTIN 26 /* background tty read attempted */ #define SIGTTOU 27 /* background tty write attempted */ #define SIGVTALRM 28 /* virtual timer expired */ #define SIGPROF 29 /* profiling timer expired */ #define SIGXCPU 30 /* exceeded cpu limit */ #define SIGXFSZ 31 /* exceeded file size limit */ #define SIGWAITING 32 /* process's lwps are blocked */ #define SIGLWP 33 /* special signal used by thread library */ #define SIGFREEZE 34 /* special signal used by CPR */ #define SIGTHAW 35 /* special signal used by CPR */ #define SIGCANCEL 36 /* thread cancellation signal used by libthread */ #define SIGLOST 37 /* resource lost (eg, record-lock lost) */Signal 11, SIGSEGV is the signal that is received when the program detects a segmentation fault (memory exception error). The default action for this is to display the message
Segmentation Fault (core dumped)
You can change the action for this so that it displays a different message, but of course you cannot try to continue to run the program.
Signal 9, SIGKILL, is the kill signal. A program is not
allowed to change the signal handler for this signal.
Otherwise, it would be possible for a program to change
all of its signal handlers so that no one could kill
a rogue program. To send a kill signal from the shell
to a particular process, enter
kill -9 ProcessNumber
Signal 14 SIGALRM sends an alarm to a process. The
default SIGALRM handler is to abort the program, but this
can be changed. The system call
unsigned int alarm(unsigned int sec);
sends a SIGALRM signal to the process after sec
seconds. If you have changed the signal handler
function for this, then you can arrange for an event
to happen after a set period of time.
You can choose to ignore any signal (except SIGKILL)
by using SIG_IGN
as the second argument of
signal
. You can also reset the signal
handler for a particular signal to its default by
using SIG_DFL
as the second argument to
signal
.
Killing Zombies
Recall that if a child dies before its parent calls wait, the child becomes a zombie. In some applications, a web server for example, the parent forks off lots of children but doesn't care whether the child is dead or alive. For example, a web server might fork a new process to handle each connection, and each child dies when the client breaks the connection. Such an application is at risk of producing many zombies, and zombies can clog up the process table.
When a child dies, it sends a SIGCHLD signal to its parent. The parent process can prevent zombies from being created by creating a signal handler routine for SIGCHLD which calls wait whenever it receives a SIGCHLD signal. There is no danger that this will cause the parent to block because it would only call wait when it knows that a child has just died.
There are several versions of wait on a Unix system. The system call waitpid has this prototype
#include <sys/types.h> #include <sys/wait.h> pid_t waitpid(pid_t pid, int *stat_loc, int options)This will function like wait in that it waits for a child to terminate, but this function allows the process to wait for a particular child by setting its first argument to the pid that we want to wait for. However, that is not our interest here. If the first argument is set to zero, it will wait for any child to terminate, just like wait. However, the third argument can be set to WNOHANG. This will cause the function to return immediately if there are no dead children. It is customary to use this function rather than wait in the signal handler.
Here is some sample code
#include <sys/types.h> #include <stdio.h> #include <signal.h> #include <wait.h> #include <unistd.h> void *zombiekiller(int n) { int status; waitpid(0,&status,WNOHANG); signal(SIGCHLD,zombiekiller); return (void *) NULL; } int main() { signal(SIGCHLD, zombiekiller); .... }
File Locking
It is often the case that several processes or threads would like to access the same file concurrently. As long as they are reading the file, this is not a problem, but if several processes are trying to modify the file at the same time, changes can be missed.
Consider the simple and common problem where a user opens a file in a text editor, makes some changes to the file, and then saves the file. If a second process opens the same file for editing, makes some different changes, and then saves the file, one set of changes will be lost depending on the order in which the two processes saved the file.
Operating systems have a mechanism of file locking to prevent this occurrence.
Unix
Unix has two system calls for locking files. The earlier one is flock
#include <sys/file.h> int flock( int fd, int operation);The first argument is a file descriptor returned from a call to open, the second is one of the following symbolic names.
LOCK_SH /* shared lock */ LOCK_EX /* exclusive lock */ LOCK_UN /* Unlock a file */ LOCK_NB /* Nonblocking lock */A process that wants to write to a potentially shared file first calls flock with LOCK_EX as its second argument. If another process or thread has already locked the file with flock this process will block until the other process unlocks the file. To unlock a file, use LOCK_UN as the second argument.
This file lock is an advisory lock. This means that a rogue process can ignore the lock and write to a locked file. However, if all of the processes which could potentially modify a file use this mechanism, it can prevent data loss as a result of contention.
The flock call locks the whole file, and this can produce serious performance problems if there are many processes trying to update a database concurrently. The newer Unix file locking call is lockf. This locks only a portion of the file, a single record in a database for example, rather than the entire file, so that many concurrent processes can update a database at the same time as long that they are not trying to update the same record.
Here is the function prototype
#include <unistd.h> int lockf(int fildes, int function, off_t size);The first argument is the file descriptor, the second argument is one of the following:
#define F_ULOCK 0 /* unlock previously locked section */ #define F_LOCK 1 /* lock section for exclusive use */ #define F_TLOCK 2 /* test & lock section for exclusive use */ #define F_TEST 3 /* test section for other locks */The two of these that you are most likely to need are F_LOCK to lock the file and F_ULOCK to unlock it.
The third argument is the number of bytes to lock. These start at the current file offset, i.e. the next place in the file where a read or write will occur.
This is also an advisory rather than a mandatory lock. Both of these will put the calling process or thread to sleep if the file or section of the file is blocked, and awaken the process or thread when the record is unlocked.
More Windows APIs
The API to open a file in windows is the inappropriately named CreateFile.
HANDLE CreateFile( LPCTSTR lpFileName, // pointer to name of the file DWORD dwAccess, // access (read-write) mode DWORD dwShareMode, // share mode LPSECURITY_ATTRIBUTES lpSecurityAttributes, // pointer to security attributes DWORD dwCreate, // how to create DWORD dwFlagsAndAttributes, // file attributes HANDLE hTemplateFile // handle to file with attributes to // copy );dwAccess can be either GENERIC_READ or GENERIC_WRITE or both.
dwShareMode can be
dwCreate can have these values
To close a file, use the generic API BOOL CloseHandle(HANDLE hObject)
Here is the APIs for reading a file.
BOOL ReadFile( HANDLE hFile, // handle of file to read LPVOID lpBuffer, // pointer to buffer that receives data DWORD nNumberOfBytesToRead, // number of bytes to read LPDWORD lpNumberOfBytesRead, // pointer to number of bytes read LPOVERLAPPED lpOverlapped // pointer to structure for data );
This works exactly like the Unix read function except that the number of bytes read is in the fourth argument. The last argument should be NULL.
Here is the API for writing to a file.
BOOL WriteFile( HANDLE hFile, // handle to file to write to LPCVOID lpBuffer, // pointer to data to write to file DWORD nNumberOfBytesToWrite, // number of bytes to write LPDWORD lpNumberOfBytesWritten, // pointer to number of bytes written LPOVERLAPPED lpOverlapped // pointer to structure for overlapped I/O );
Return to the course home page