* Faculty       * Staff       * Contact       * Institute Directory
* Research Groups      
* Undergraduate       * Graduate       * Institute Admissions: Undergraduate | Graduate      
* Events       * Institute Events      
* Lab Manual       * Institute Computing      
No Menu Selected

* Research

Ph.D. Theses

Adaptive Parallel Computations on Networks of Workstations

By Mohan V. Nibhanupudi
Advisor: Boleslaw K. Szymanski
April 21, 1998

In this thesis we consider parallel computations on networks of nondedicated workstations. The workstations can be used for executing additional parallel computations when idle; however, the computation must be suspended when owner activity resumes. Because of interdependencies between component processes, parallel programs cannot make progress if some of the participating workstations become unavailable during the computation. To deliver acceptable performance, parallel programs executing in such environments must adapt to the changing computing environment.

We propose a new approach to adaptive parallelism in networks of nondedicated workstations. Our approach, referred to as the adaptive replication scheme (ARS), is based on the Bulk-Synchronous Parallel (BSP) model. Using several applications, we demonstrate that the BSP model can be used to build efficient implementations of parallel algorithms on networks of workstations.

The adaptive replication scheme relies on recovering the computations of a failed process by replicating its computations on another available workstation and eventually migrating the recovered process to a new available workstation. Replication of computations is made possible by eagerly replicating the computation state of each process on a backup process. We analyze the performance of a parallel computation using adaptive replication scheme for exponentially distributed available and nonavailable periods of workstations. The scheme is scalable when the underlying communication network is scalable.

We designed and implemented the adaptive replication scheme on top of an existing library for the BSP model. The new library is referred to as the Adaptive BSP (or ABSP) library. We describe adaptive parallel extensions to the BSP library and a protocol for replication of computation state and recovery of failed processes. To insulate the application programs from the implementation, we designed the ABSP library in layers. We describe the design and implementation of these layers. Finally we present the results of application of the ABSP library to two parallel applications - a plasma simulation using particle-in-cell method and a graph search algorithm, executing on a network of nondedicated workstations.

* Return to main PhD Theses page