A Lightweight Message Logging Scheme for Fault Tolerant MPI
暂无分享,去创建一个
[1] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[2] Adrianos Lachanas,et al. MPI-FT: Portable Fault Tolerance Scheme for MPI , 2000, Parallel Process. Lett..
[3] Barton P. Miller,et al. Optimal tracing and replay for debugging message-passing parallel programs , 1992, Proceedings Supercomputing '92.
[4] Ian T. Foster,et al. The Globus project: a status report , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).
[5] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[6] Alan L. Cox,et al. Lazy release consistency for software distributed shared memory , 1992, ISCA '92.
[7] Jack J. Dongarra,et al. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.
[8] Heon Young Yeom,et al. MPICH-GF: Transparent Checkpointing and Rollback-Recovery for Grid-Enabled MPI Processes , 2004, IEICE Trans. Inf. Syst..
[9] Heon Young Yeom,et al. A causal logging scheme for lazy release consistent distributed shared memory systems , 1998, Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250).
[10] Heon Young Yeom,et al. An efficient causal logging scheme for recoverable distributed shared memory systems , 2002, Parallel Comput..
[11] Lorenzo Alvisi,et al. Nonblocking and orphan-free message logging protocols , 1992, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.
[12] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[13] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.
[14] Anthony Skjellum,et al. MPI/FT/sup TM/: architecture and taxonomies for fault-tolerant, message-passing middleware for performance-portable parallel computing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.
[15] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).