Fault Tolerant MPI for the HARNESS Meta-computing System

Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models. Subsequently forcing them to support a dynamic process model suitable for use on clusters or distributed systems would have reduced their performance. As current HPC collaborative applications increase in size and distribution the potential levels of node and network failures increase the need arises for new fault tolerant systems to be developed. Here we present a new implementation of MPI called FT-MPI that allows the semantics and associated modes of failures to be explicitly controlled by an application via a modified MPI API. Given is an overview of the FT-MPI semantics, design, example applications and some performance issues such as efficient group communications and complex data handling.

[1]  Sathish S. Vadhiyar,et al.  ACCT: Automatic Collective Communications Tuning , 2000, PVM/MPI.

[2]  Ian Foster,et al.  Algorithm comparison and benchmarking using a parallel spectra transform shallow water model , 1995 .

[3]  James Arthur Kohl,et al.  HARNESS: a next generation distributed virtual machine , 1999, Future Gener. Comput. Syst..

[4]  Lars Paul Huse Collective Communication on Dedicated Clusters of Workstations , 1999, PVM/MPI.

[5]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[6]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[7]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[8]  Sathish S. Vadhiyar,et al.  Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Roy Friedman,et al.  Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[10]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  Henri E. Bal,et al.  Bandwidth-efficient collective communication for clustered wide area systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[12]  Michael M. Resch,et al.  Distributed Computing in a Heterogeneous Computing Environment , 1998, PVM/MPI.

[13]  Jack J. Dongarra,et al.  Scalable Networked Information Processing Environment (SNIPE) , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[14]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[15]  Vaidy S. Sunderam,et al.  PVM Emulation in the Harness Metacomputing System: A Plug-in Based Approach , 1999, PVM/MPI.

[16]  Jack J. Dongarra,et al.  MPI_Connect Managing Heterogeneous MPI Applications Ineroperation and Process Control , 1998, PVM/MPI.

[17]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[18]  Hubert Ritzdorf,et al.  Flattening on the Fly: Efficient Handling of MPI Derived Datatypes , 1999, PVM/MPI.

[19]  Adrianos Lachanas,et al.  MPI-FT: Portable Fault Tolerance Scheme for MPI , 2000, Parallel Process. Lett..

[20]  Bernard Tourancheau,et al.  BIP: A New Protocol Designed for High Performance Networking on Myrinet , 1998, IPPS/SPDP Workshops.