Fault-Management in P2P-MPI

We present in this paper a study on fault management in a grid middleware. The middleware is our home-grown software called P2P-MPI. This framework is MPJ compliant, allows users to execute message passing parallel programs, and its objective is to support environments using commodity hardware. Hence, running programs is failure prone and a particular attention must be paid to fault management. The fault management covers two issues: fault-tolerance and fault detection. Fault-tolerance deals with the program execution: P2P-MPI provides a transparent fault tolerance facility based on replication of computations. Fault detection concerns the monitoring of the program execution by the system. The monitoring is done through a distributed set of modules called failure detectors. The contribution of this paper is twofold. The first contribution is the evaluation of the failure probability of an application depending on the replication degree. The failure probability depends on the execution length, and we propose a model to evaluate the duration of a replicated parallel program. Then, we give an expression of the replication degree required to keep the failure probability of an execution under a given threshold. The second contribution is a study of the advantages and drawbacks of several fault detection systems found in the literature. The criteria of our evaluation are the reliability of the failure detection service and the failure detection speed. We retain the binary round-robin protocol for its failure detection speed, and we propose a variant of this protocol which is more reliable than the application execution in any case. Experiments involving of up to 256 processes, carried out on Grid’5000, show that the real detection times closely match the predictions.

[1]  Thomas Hérault,et al.  Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid , 2005, Future Gener. Comput. Syst..

[2]  John Paul Walters,et al.  A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications , 2007, HiPC.

[3]  B. Bouteiller,et al.  MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[4]  Stéphane Genaud,et al.  A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2005, PVM/MPI.

[5]  Anthony Skjellum,et al.  MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware , 2004, Cluster Computing.

[6]  Rob van Nieuwpoort,et al.  MPJ/Ibis: A Flexible and Efficient Message Passing Platform for Java , 2005, PVM/MPI.

[7]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[8]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[9]  DéfagoXavier,et al.  Total order broadcast and multicast algorithms , 2004 .

[10]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[11]  Alan D. George,et al.  Gossip-Style Failure Detection and Distributed Consensus for Scalable Heterogeneous Clusters , 2004, Cluster Computing.

[12]  Nazareno Andrade,et al.  Labs of the World, Unite!!! , 2006, Journal of Grid Computing.

[13]  Richard Wolski,et al.  Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.

[14]  Gilles Fedak,et al.  XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[15]  Jason Duell,et al.  The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..

[16]  Kazuyuki Shudo,et al.  P3: P2P-based middleware enabling transfer and aggregation of computational resources , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[17]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[18]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[19]  Nazareno Andrade,et al.  OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.

[20]  Mark Baker,et al.  MPJ Express: Towards Thread Safe Java HPC , 2006, 2006 IEEE International Conference on Cluster Computing.

[21]  Fred B. Schneider,et al.  Replication management using the state-machine approach , 1993 .

[22]  Rachid Guerraoui,et al.  Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[23]  Geoffrey C. Fox,et al.  MPJ: MPI-like message passing for Java , 2000 .

[24]  Bruce W. Char,et al.  Maple V Language Reference Manual , 1993, Springer US.

[25]  Jason Maassen,et al.  Ibis: a flexible and efficient Java‐based Grid programming environment , 2005, Concurr. Pract. Exp..

[26]  Stéphane Genaud,et al.  P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2007, Journal of Grid Computing.

[27]  Jason Maassen,et al.  Ibis: a flexible and efficient Java-based Grid programming environment: Research Articles , 2005 .

[28]  Lorenzo Alvisi,et al.  Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.

[29]  Robbert van Renesse,et al.  A Gossip-Style Failure Detection Service , 2009 .