Reliability in LAM/MPI Requirements Specification

[1]  Jack J. Dongarra,et al.  Visualization and debugging in a heterogeneous environment , 1993, Computer.

[2]  Anthony Skjellum,et al.  MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[3]  William Gropp,et al.  Mpi - The Complete Reference: Volume 2, the Mpi Extensions , 1998 .

[4]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .

[5]  Jack Dongarra,et al.  Integrated Pvm Framework Supports Heterogeneous Network Computing , 1993 .

[6]  Georg Stellner,et al.  CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.

[7]  Péter Urbán,et al.  Chasing the FLP impossibility result in a LAN: or, How robust can a fault tolerant server be? , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[8]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[9]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[10]  Jack J. Dongarra,et al.  FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.

[11]  Roy Friedman,et al.  Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[12]  William Gropp,et al.  Users guide for mpich, a portable implementation of MPI , 1996 .

[13]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[14]  William Gropp,et al.  MPI: The Complete Reference , Vol. 2 - The MPI-2 Extensions , 1998 .

[15]  William Gropp,et al.  Dynamic process management in an MPI setting , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[16]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[17]  Robert B. Ross,et al.  Using MPI-2: Advanced Features of the Message Passing Interface , 2003, CLUSTER.

[18]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1985, JACM.

[19]  Jack Dongarra,et al.  PVM: Experiences, current status and future direction , 1993, Supercomputing '93. Proceedings.

[20]  Harrick M. Vin,et al.  Egida: an extensible toolkit for low-overhead fault-tolerance , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[21]  Geoffrey James The Tao of Programming , 1987 .

[22]  P. Merkey,et al.  Beowulf: harnessing the power of parallelism in a pile-of-PCs , 1997, 1997 IEEE Aerospace Conference.

[23]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[24]  William Gropp The MPI-2 extensions , 1998 .