CSCI.4210 Operating Systems Fall, 2009 Class 20
Multiprocessors, Multicomputers

Review of Multiprocessor Systems

There are a three different models of multiprocessor systems.

In the last class we discussed shared memory multiprocessors.

Multicomputers

The second computer system architecture is the multicomputer, in which each CPU has its own memory, but the CPU are tightly coupled. Processes can no longer use shared memory for communication, so interprocess communication is done with Message Passing. Because the messages need to go over a network, message passing communication is several orders of magnitude slower than communication through shared memory. A group of tightly clustered computers is called a cluster

Clusters provide a number of attractive features.

There are several methods of connecting the nodes together. One obvious method is using Ethernet, which is a broadcast protocol. However, other systems use various point-to-point protocols. For example, the nodes can be connected using a grid or mesh.

One interesting example of this is a hypercube. The number of nodes (CPUs) on a hypercube should be a power of 2. Picture a cube, with the corners as nodes and lines between them as connections. There are eight nodes, and no node is more than three hops from any other node. It takes three hops to get from the top left front node to the bottom right back node.

A four dimensional cube would have 16 nodes, and a maximum of four hops to get from one node to another.

A five dimensional hypercube would have 32 nodes and a maximum of five hops to get from one node to another, and so on.

Clusters often include middleware to provide a single image to all of the users; that is, the fact that there are numerous nodes is transparent to the typical user. A user logs into the cluster, not onto a particular machine in the cluster, and sees a single file system, such as NFS, a common user interface, and a common view of resources and devices.

Even though each node in the cluster has its own memory, it is possible for a cluster to provide a single memory image for multiple CPUs; this is called Distributed Shared Memory. Each page is in the memory of one of the nodes, but all nodes have access to it. This is made possible because of virtual memory. When a CPU tries to access a page that it does not have in its page table, the OS locates the page in the network (perhaps on some other computer). The network sends to page to the requesting node where it is mapped in its page table.

This can result in cache inconsistencies if a CPU writes to its own instance of a page. As a result, whenever a process is about to execute a WRITE to a virtual memory page, a message is sent to all the other CPUs which hold a copy of that page telling them unmap and discard the page.

Load balancing is a major issue in such systems. Once a process has been assigned to a node, it is difficult to move it, and so if the system is busy, it is important to keep all of the processors occupied. It is easy for a situation to develop where one node is overworked and another is idle.

There are a number of algorithms for this, depending on how much information is available and how much information the scheduler has. Here are three:

In order to implement load balancing, it there must be a mechanism to migrate a process from one node to another. There are several ways to do this.

A final issue which has to be resolved in process migration is the status of outstanding signals and messages.

The standard cluster software for a bunch of Linux machines is Beowulf.

Virtualization

A virtual machine (VM) is a software implementation of a machine (i.e. a computer) or program that executes programs like a physical machine. Java programs typically run in a virtual machine called the Java Runtime Environment (JRE). Cygwin is a virtual machine of sorts.

Virtualization as applied to operating systems refers to running several different OSs (or multiple instances of the same OS) on a single machine. This concept has been around for a long time; in the 1970s, IBM had a mainframe operating system called VM, in which each user was given their own instance of the OS. The company VMWare has made virtualization popular, although there are other virtualizations such as Solaris Zones. Virtualization has become so important that Intel has added features to assist virtualization onto their Processors. The technology, called VT by Intel creates containers in which virtual machines can be run.

Here are some advantages of virtualization.

The main disadvantage is that it is less efficient than a real machine because it has to access the hardware indirectly.

One obvious use is software testing. A company can test software on multiple platforms, including different versions of an operating system. Also, each guest OS can be treated as a sandbox, an environment to test potentially buggy or virus infected code which is isolated from the production environment.

The underlying software that controls this is called a hypervisor, and this is the true Operating System. The other operating systems (the ones that the users see) are called guest operating systems. Whenever a process on a guest OS tries to execute a privileged instruction, this is intercepted by the hypervisor.

There are two types of hypervisors, Type 1 and Type 2.

A type 1 hypervisor runs on the bare metal, a type 2 hypervisor runs as a process on the host OS (linux, Windows or whatever).

There are a number of problems that virtualization has to solve. The first is that the guest operating system will have to execute privileged instructions (recall from the first class that there are some processor instructions which should only be executed in kernel mode). In all types of virtualization, these privileged instructions are trapped by the hypervisor rather than run directly, and the hypervisor then executes them.

VMWare uses a Type 2 Hypervisor. In this case, as the code is run, it is inspected to find basic blocks, that is, runs of instructions that do not have jump instructions. If there are privileged instructions, they are trapped by the hypervisor, which handles them; otherwise, the blocks run directly, so there is no slowdown. This can work because once the instructions are cached, they do not need to be reprocessed.

Another problem is memory virtualization. Each guest OS thinks that it can see the entire memory, and of course each has its own page table. What happens if two different guests try to map to the same physical page. The hypervisor solves this problem by creating a shadow page table that maps the virtual pages used by the virtual machine onto real pages. This can result in a performance penalty, because every change to a guest's virtual page table results in an update to the shadow page table as well.

Distributed Computing

Both Multiprocessor Systems and Multicomputers are more or less tightly coupled. Typically all of the computers are in the same room, and often on the same rack. The idea behind distributed computing is that many, essentially independent computers, often widely distributed, can work together on a problem. A current initiative in our department is world wide computing, sometimes called Grid Computing in which computers all over the world work together, communicating over the Internet.

There are few special implications for the design of operating systems for computers which will be doing loosely coupled, distributed computing. Each node simply runs its own native operating system. Note that it is possible, and indeed likely, that different nodes are running different operating systems. Interprocess communication between processes on different computers is done using standard protocols over the network.

We can expand our definition of an operating system to include any system which coordinates computation, even including widely distributed computation. If we do this, a distributed system introduces new problems.

One problem with distributed computing over the Internet is that there can be significant and unpredictable delays in transmission. This leads to a problem for some applications such as distributed games, because it is important to have a consistent global state, and this is difficult if there are substantial and unpredictable communication lags. For example, if Alice and Bob are playing a distributed game in which users score points by killing bad guys, and both users have a chance to kill the same bad guy at about the same time, it is possible that Alice will kill the bad guy on her machine while at the same time Bob kills the same bad guy on his machine because the news that Alice had already killed the bad guy had not reached his machine yet. This is an inconsistent state; each thinks that they should get the points for killing the bad guy.

For a more formal analysis of the problem, I'll use a more mundane example (borrowed from Stallings, Operating Systems). An individual has accounts in two different branches of a bank, and every day at 3:00, each branch sends the balance to a central server. At about 3:00, each account has $100, and the account holder transfers $20 from the account in Branch A to the account in Branch B. There are two scenarios in which this could result in the wrong answer being stored on the central server.

One solution to this problem is the distributed snapshot algorithm. This assumes that messages are delivered in the order that they are sent, and no messages are lost. Note that TCP satisfies this requirement.

The algorithm uses a control message called a marker. Some process initiates the algorithm by recording its state and sends out a marker on all of its outgoing channels before any other messages are sent. Each process p, when it first receives the marker from another process, q for example, does the following

At any time after recording its state, when process p receives a marker from process r, it records the state of the channel from r to p as the sequence of messages that it received from the time it recorded its local state to the time that it received the marker from r.

Each process terminates when it receives the marker from all other processes. A globally consistent state can be recorded by having every process send its state data along every outgoing channel.

Using a marker to solve our bank problem is trivial. Suppose that when the server thinks that it's 3:00, it sends out a marker. Consider our first case where the Branch A has sent the message for the transfer but Branch B has not yet received it. Branch A receives the marker from the server and records its state (balance is $80). Branch B receives the marker from the server and records its state (balance is $100). But we're not done yet. Branch A sends the marker to Branch B, and by the rules that we set, it does not arrive until after the transfer message, so when it does arrive, Branch B has already received the transfer message. It updates its state accordingly. Branch B also sends the marker to Branch A. Now the algorithm terminates and everyone is in agreement.

Some Distributed Computing Architectures

The text (and everyone else), talks about middleware as if it were a separate layer in the protocol stack, but it isn't. These are just protocols to get work done on a distributed network. I will discuss several of these systems.

Document Based Middleware

The World Wide Web is based on a very simple protocol called http (Hypertext Transport Protocol). A client (a web browser) sends a GET request for a document to a web server, using TCP/IP. Here is a simple GET request to retrieve the course home page. It would be sent to the CS department web server www.cs.rpi.edu on port 80.

 
GET /academics/courses/os/index.html HTTP/1.0 
A web browser would follow this with some additional information, but this statement would work .

The web server (apache) would retrieve the web page and send it back to the browser. The browser would then display this document.

The web was initially conceived to retrieve static documents such as this, but very quickly, it was modified to include new features.

cgi (The common gateway interface) This allows the client (the browser) to send a more specific request to the server. For example, a request to a stock market quote server might look something like this.

GET /quote?stock=IBM HTTP/1.0
Rather than returning a static document, the server would query a database to retrieve some information about the current price of a share of IBM stock, and it would format a document on the fly and send this to the client.

There are a number of widely used server technologies which have been developed to facilitate such cgi calls. Examples include the freeware tomcat which runs Java Server Pages. Commercial products include Microsoft ASP, BEA weblogic, Cold Fusion, and IBM WebSphere.

cookies Initially http was conceived as a stateless protocol. The server simply received requests and returned the appropriate document. However, for more elaborate interactions, it was necessary for the server to send some information to the client. This information is called a cookie. The browser stores this cookie and whenever it makes a new request to the server, it sends the cookie along, so that the server knows that this request is part of an ongoing session.

Return to the course home page