Policies for swapping MPI processes

Despite the enormous amount of research and development work in the area of parallel computing, it is a common observation that simultaneous performance and ease-of-use are elusive. We believe that ease-of-use is critical for many end users, and thus seek performance enhancing techniques that can be easily retrofitted to existing parallel applications. In a precious paper we have presented MPI (message passing interface) process swapping, a simple add-on to the MPI programming environment that can improve performance in shared computing environments. MPI process swapping requires as few as three lines of source code change to an existing application. In this paper we explore a question that we had left open in our previous work: based on which policies should processes be swapped for best performance? Our results show that, with adequate swapping policies, MPI process swapping can provide substantial performance benefits with very limited implementation effort.

[1]  Weiping Zhu,et al.  An experimental study of load balancing on Amoeba , 1995, Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis.

[2]  R. Diekman,et al.  Load balancing strategies for distributed memory machines , 2000 .

[3]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[4]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[5]  Robert Elsässer,et al.  Diffusion Schemes for Load Balancing on Heterogeneous Networks , 2002, Theory of Computing Systems.

[6]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[7]  Sathish S. Vadhiyar,et al.  A metascheduler for the Grid , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[8]  Francine Berman,et al.  Applying scheduling and tuning to on-line parallel tomography , 2001, SC '01.

[9]  Anthony Skjellum,et al.  MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[10]  Henk-Evert Sonder,et al.  The Microsoft .NET Framework , 2001 .

[11]  Peter A. Dinda,et al.  The statistical properties of host load , 1999, Sci. Program..

[12]  Mor Harchol-Balter,et al.  Exploiting process lifetime distributions for dynamic load balancing , 1995, SIGMETRICS.

[13]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[14]  John Shalf,et al.  The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment , 2001, Int. J. High Perform. Comput. Appl..

[15]  Francine Berman,et al.  Modeling the Cost of Redistribution in Scheduling , 1997, PPSC.

[16]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[17]  Francine Berman,et al.  Toward a framework for preparing and executing adaptive grid programs , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  Jeffrey S. Vetter,et al.  Autopilot: adaptive control of distributed applications , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[19]  Edward D. Lazowska,et al.  The limited performance benefits of migrating active processes for load sharing , 1988, SIGMETRICS '88.

[20]  A. Adas,et al.  Traffic models in broadband networks , 1997, IEEE Commun. Mag..

[21]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[22]  Thierry Le Sergent,et al.  Balancing Load under Large and Fast Load Changes in Distributed Computing Systems - A Case Study , 1994, CONPAR.

[23]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[24]  Francine Berman,et al.  Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[25]  Thomas Hérault,et al.  MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[26]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[27]  William Gropp,et al.  MPI: The Complete Reference , Vol. 2 - The MPI-2 Extensions , 1998 .

[28]  Henri Casanova,et al.  Scheduling distributed applications: the SimGrid simulation framework , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[29]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[30]  George Cybenko,et al.  Dynamic Load Balancing for Distributed Memory Multiprocessors , 1989, J. Parallel Distributed Comput..

[31]  Veljko M. Milutinovic,et al.  Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..

[32]  Dean Sutherland,et al.  The architecture of the Remos system , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[33]  Rohit Chandra,et al.  Parallel programming in openMP , 2000 .

[34]  S. Zhou,et al.  A Trace-Driven Simulation Study of Dynamic Load Balancing , 1987, IEEE Trans. Software Eng..

[35]  Alan Heirich,et al.  A Competitive Analysis of Load Balancing Strategies for Parallel Ray Tracing , 2004, The Journal of Supercomputing.

[36]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[37]  Francine Berman,et al.  Heuristics for scheduling parameter sweep applications in grid environments , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[38]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[39]  Jack J. Dongarra,et al.  FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.

[40]  Henri Casanova,et al.  MPI Process Swapping: Architecture and Experimental Verification , 2003 .

[41]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.