Malleable iterative MPI applications

Malleability enables a parallel application's execution system to split or merge processes modifying granularity. While process migration is widely used to adapt applications to dynamic execution environments, it is limited by the granularity of the application's processes. Malleability empowers process migration by allowing the application's processes to expand or shrink following the availability of resources. We have implemented malleability as an extension to the process checkpointing and migration (PCM) library, a user‐level library for iterative message passing interface (MPI) applications. PCM is integrated with the Internet Operating System, a framework for middleware‐driven dynamic application reconfiguration. Our approach requires minimal code modifications and enables transparent middleware‐triggered reconfiguration. Experimental results using a two‐dimensional data parallel program that has a regular communication structure demonstrate the usefulness of malleability. Copyright © 2008 John Wiley & Sons, Ltd.

[1]  Boleslaw K. Szymanski,et al.  An Architecture for Reconfigurable Iterative MPI Applications in Dynamic Environments , 2005, PPAM.

[2]  Laxmikant V. Kalé,et al.  Performance evaluation of adaptive MPI , 2006, PPoPP '06.

[3]  Alexandre C. Sena,et al.  Distributed and dynamic self-scheduling of parallel MPI Grid applications: Research Articles , 2007 .

[4]  Sathish S. Vadhiyar,et al.  SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems , 2003, Parallel Process. Lett..

[5]  M. Nakazawa,et al.  Dyn-MPI: Supporting MPI on Non Dedicated Clusters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6]  Carlos A. Varela,et al.  Malleable Components for Scalable High Performance Computing , 2006 .

[7]  Boleslaw K. Szymanski,et al.  The Internet Operating System: Middleware for Adaptive Distributed Computing , 2006, Int. J. High Perform. Comput. Appl..

[8]  Richard Wolski,et al.  Dynamically forecasting network performance using the Network Weather Service , 1998, Cluster Computing.

[9]  Jesús Labarta,et al.  Implementing Malleability on MPI Jobs , 2004, IEEE PACT.

[10]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[11]  Roy Friedman,et al.  Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[12]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[13]  Carlos A. Varela,et al.  Malleable applications for scalable high performance computing , 2007, Cluster Computing.

[14]  Akinori Yonezawa,et al.  Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources , 2003, PPoPP '03.

[15]  Larry Carter,et al.  Special issue on Scheduling techniques for large-scale distributed platforms , 2005 .

[16]  Henri Casanova,et al.  A Simple MPI Process Swapping Architecture for Iterative Applications , 2004, Int. J. High Perform. Comput. Appl..