Ph.D. Theses

Asynchronous Global Optimization for Massive-Scale Computing

By Travis Desell
Advisor: Boleslaw K. Szymanski, Travis Desell
November 19, 2009

The use of massive-scale heterogeneous computing environments is becoming more widespread, through the use of different Internet computing technologies such as the Berkeley Open Infrastructure for Network Computing (BOINC). Volunteer computing infrastructures can offer milions of computing hosts, approaching or surpassing petascale computing power. The low cost and potential computing power of these environments makes them highly desirable to researchers in a broad range of scientific disciplines, where the rate of data acquisition and increasing modeling complexity is far outpacing improvements in processor speed. Even the amount of parallelism on homogeneous computing environments is reaching massive scales, with supercomputers utilizing hundreds of thousands of processors and graphical processing units approaching thousands of parallel threads.

However, effectively utilizing these computing environments for scientific modeling involves significant challenges. While traditional optimization methods for scientific modeling are more applicable to homogeneous and highly reliable computing environments, they cannot be applied to heterogeneous and fault prone environments due to their sequential and iterative nature. Lost or invalid results must be recalculated before the optimization methods can progress, which drastically reduces their performance and efficiency. Additionally, these optimization methods are based on evolving populations, which limits their scalability to the population size and prevents them from being able to utilize massive-scale computing environments.

This work examines modifying traditional global optimization methods, such as differential evolution, genetic search and particle swarm optimization for use on massive-scale computing environments. A generic asynchronous, or non-iterative strategy, for global optimization is applied to all three search types, resulting in variants that can easily scale to hundreds of thousands of computing hosts, that are automatically load balanced by design, and are unaffected by unresponsive hosts. Additionally, verification strategies are examined which prevent malicious or faulty hosts from affecting the progress of the optimization.

The asynchronous optimization methods have been tested on different computing environments using a representative astroinformatics application which evaluates the fitness of three dimensional models of the Milky Way galaxy. Optimization has been done on RPI's CCNI BlueGene/L supercomputer as well as the MilkyWay@Home volunteer computing system. MilkyWay@Home currently consists of over 50,000 computing hosts from over 25,000 volunteers, providing a highly heterogeneous and fault prone computing environment, and this work has enabled the use of this powerful computing system (currently running at 216 teraflops) to advance knowledge of the structure, origin and evolution of the Milky Way galaxy.

In addition to actual massive scale computing systems, the asynchronous optimization methods have also been tested using simulated environments and challenging benchmark equations to measure the scalability and the effect of heterogeneity on these methods. When compared on simulated homogeneous computing environments, representative of supercomputers or graphical processing units, traditional parallel methods are shown to generally take longer to reach solutions as they approach massive scale, while the asynchronous methods are shown to improve linearly as more hosts are added. In addition, the asynchronous optimization methods are shown to be resilient to highly asynchronous computing environments, with results showing that as the range of time for simulated hosts to report results increases (representing increased heterogeneity), the asynchronous optimization methods are shown to require similar, or even less evaluations to reach a solution.

As computing environments continue to become increasingly distributed and heterogeneous, even at the processor level, this work shows that optimization methods must embrace an asynchronous and distributed computing ideology to remain effective and efficient. The results show that the generic asynchronous optimization strategy developed by this work provides a strong starting point for the development of even more efficient and highly scalable distributed optimization methods.

Return to main PhD Theses page