Fault-Management in P2P-MPI
暂无分享,去创建一个
Emmanuel Jeannot | Stéphane Genaud | Choopan Rattanapoka | E. Jeannot | S. Genaud | Choopan Rattanapoka
[1] Thomas Hérault,et al. Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid , 2005, Future Gener. Comput. Syst..
[2] John Paul Walters,et al. A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications , 2007, HiPC.
[3] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[4] Stéphane Genaud,et al. A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2005, PVM/MPI.
[5] Anthony Skjellum,et al. MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message-Passing Middleware , 2004, Cluster Computing.
[6] Rob van Nieuwpoort,et al. MPJ/Ibis: A Flexible and Efficient Message Passing Platform for Java , 2005, PVM/MPI.
[7] Message P Forum,et al. MPI: A Message-Passing Interface Standard , 1994 .
[8] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[9] DéfagoXavier,et al. Total order broadcast and multicast algorithms , 2004 .
[10] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[11] Alan D. George,et al. Gossip-Style Failure Detection and Distributed Consensus for Scalable Heterogeneous Clusters , 2004, Cluster Computing.
[12] Nazareno Andrade,et al. Labs of the World, Unite!!! , 2006, Journal of Grid Computing.
[13] Richard Wolski,et al. Modeling Machine Availability in Enterprise and Wide-Area Distributed Computing Environments , 2005, Euro-Par.
[14] Gilles Fedak,et al. XtremWeb: a generic global computing system , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[15] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..
[16] Kazuyuki Shudo,et al. P3: P2P-based middleware enabling transfer and aggregation of computational resources , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..
[17] Sam Toueg,et al. Unreliable failure detectors for reliable distributed systems , 1996, JACM.
[18] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[19] Nazareno Andrade,et al. OurGrid: An Approach to Easily Assemble Grids with Equitable Resource Sharing , 2003, JSSPP.
[20] Mark Baker,et al. MPJ Express: Towards Thread Safe Java HPC , 2006, 2006 IEEE International Conference on Cluster Computing.
[21] Fred B. Schneider,et al. Replication management using the state-machine approach , 1993 .
[22] Rachid Guerraoui,et al. Failure detectors as first class objects , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.
[23] Geoffrey C. Fox,et al. MPJ: MPI-like message passing for Java , 2000 .
[24] Bruce W. Char,et al. Maple V Language Reference Manual , 1993, Springer US.
[25] Jason Maassen,et al. Ibis: a flexible and efficient Java‐based Grid programming environment , 2005, Concurr. Pract. Exp..
[26] Stéphane Genaud,et al. P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs on Grids , 2007, Journal of Grid Computing.
[27] Jason Maassen,et al. Ibis: a flexible and efficient Java-based Grid programming environment: Research Articles , 2005 .
[28] Lorenzo Alvisi,et al. Message logging: pessimistic, optimistic, and causal , 1995, Proceedings of 15th International Conference on Distributed Computing Systems.
[29] Robbert van Renesse,et al. A Gossip-Style Failure Detection Service , 2009 .