Dynamic load balancing of distributed SPMD computations with explicit message-passing

Distributed systems have the potentiality of becoming an alternative platform for parallel computations. However, there are still many obstacles to overcome, one of the most serious is that distributed systems typically consist of shared heterogeneous components with highly variable computational power. We present a load balancing support that checks the load status and, if necessary, adapts the workload to dynamic platform conditions through data migrations from overloaded to underloaded nodes. Unlike task migration supports for task parallelism and other data migration frameworks for master/slave-based parallel applications, our support works for the entire class of SPMD regular applications with explicit communications such as linear algebra problems, partial differential equation solvers, image processing algorithms. Although we considered several variants (three activation mechanisms, three load monitoring techniques and four decision policies), we implemented only the protocols that guarantee program consistency. The efficiency of the strategies is tested in the instance of two SPMD algorithms that are based on the PVM library enriched by special-purpose primitives for data management. As additional contribution, our research keeps the entire support for dynamic load balancing transparent to the programmer. The only visible interface of our support is the activation phase.

[1]  Nicholas Carriero,et al.  Adaptive Parallelism and Piranha , 1995, Computer.

[2]  Michele Colajanni,et al.  Non-Uniform and Dynamic Domain Decompositions for Hypercomputing , 1997, Parallel Comput..

[3]  Alok N. Choudhary,et al.  An Efficient Heuristic Scheme for Dynamic Remapping of Parallel Computations , 1993, Parallel Comput..

[4]  Michele Colajanni,et al.  DAME: an environment for preserving the efficiency of data-parallel computations on distributed systems , 1997, IEEE Concurrency.

[5]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[6]  T. Schnekenburger,et al.  Heterogeneous partitioning in a workstation network , 1994, Proceedings Heterogeneous Computing Workshop.

[7]  Mounir Hamdi,et al.  Dynamic load balancing of data parallel applications on a distributed network , 1995, ICS '95.

[8]  Joel H. Saltz,et al.  Dynamic Remapping of Parallel Computations with Varying Resource Demands , 1988, IEEE Trans. Computers.

[9]  Michael J. Quinn,et al.  Data-parallel programming on a network of heterogeneous workstations , 1993, Concurr. Pract. Exp..

[10]  Anthony P. Reeves,et al.  Strategies for Dynamic Load Balancing on Highly Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[11]  Geoffrey C. Fox,et al.  Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions , 1995, IEEE Trans. Parallel Distributed Syst..

[12]  Anurag Kumar,et al.  Adaptive Optimal Load Balancing in a Nonhomogeneous Multiserver System with a Central Job Scheduler , 1990, IEEE Trans. Computers.

[13]  Thomas Kunz,et al.  The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme , 1991, IEEE Trans. Software Eng..

[14]  Reinhard von Hanxleden,et al.  Load Balancing on Message Passing Architectures , 1991, J. Parallel Distributed Comput..

[15]  Jingwen Wang,et al.  Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[16]  Jonathan Walpole,et al.  Adaptive load migration systems for PVM , 1994, Proceedings of Supercomputing '94.

[17]  T. Kunz The Innuence of Diierent Workload Descriptions on a Heuristic Load Balancing Scheme the Innuence of Diierent Workload Descriptions on a Heuristic Load Balancing Scheme , 2007 .

[18]  Michele Colajanni,et al.  Supporting irregular data distributions for heterogeneous clusters , 1996 .

[19]  Jonathan Walpole,et al.  MPVM: A Migration Transparent Version of PVM , 1995, Comput. Syst..