Practical Fault-Tolerant Framework for eScience Infrastructure
暂无分享,去创建一个
Kiyoung Kim | Heon Young Yeom | Hyuck Han | Youngjin Yu | Jongpil Lee | Jai Wug Kim | Hyuck Han | H. Yeom | J. W. Kim | Jongpil Lee | Kiyoung Kim | Youngjin Yu
[1] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[2] B. Bouteiller,et al. MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[3] Ronald Minnich,et al. A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.
[4] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[5] Leslie Lamport,et al. The part-time parliament , 1998, TOCS.
[6] Shigeo Maruyama,et al. Surface Phenomena of Molecular Clusters by Molecular Dynamics Method , 1996 .
[7] Kwang Jin Oh,et al. A general purpose parallel molecular dynamics simulation program , 2006, Comput. Phys. Commun..
[8] Nitin H. Vaidya,et al. Impact of Checkpoint Latency on Overhead Ratio of a Checkpointing Scheme , 1997, IEEE Trans. Computers.
[9] Ravishankar K. Iyer,et al. Modeling coordinated checkpointing for large-scale supercomputers , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[10] Heon Young Yeom,et al. MPICH-GF: Transparent Checkpointing and Rollback-Recovery for Grid-Enabled MPI Processes , 2004, IEICE Trans. Inf. Syst..
[11] Jeffrey F. Naughton,et al. Real-time, concurrent checkpoint for parallel programs , 1990, PPOPP '90.
[12] William Gropp,et al. Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .
[13] Jack J. Dongarra,et al. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.
[14] Heon Young Yeom,et al. Design and Implementation of Multiple Fault-Tolerant MPI over Myrinet (M^3) , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[15] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[16] Anthony Skjellum,et al. Using MPI - portable parallel programming with the message-parsing interface , 1994 .