Adaptive Parallelism in the Bulk-Synchronous Parallel Model

The Bulk-Synchronous Parallel (BSP) model is a universal abstraction of parallel computation that can be used to design portable parallel software. Advances in processor architecture and network communication enable clusters of workstations to be used as parallel computers. This paper focuses on using the idle computing power of a network of workstations to run parallel programs. The transient nature of the processors causes straightforward execution of synchronous BSP programs to perform poorly in such an environment. In this paper, we propose a scheme, based on the eager replication of state data and lazy replication of processes, that allows BSP programs to run efficiently on transient processors. The scheme is integrated into the Oxford BSP library.