In-Memory Checkpointing for MPI Programs by XOR-Based Double-Erasure Codes
暂无分享,去创建一个
[1] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[2] Garth A. Gibson,et al. RAID: high-performance, reliable secondary storage , 1994, CSUR.
[3] George Karypis,et al. Introduction to Parallel Computing Solution Manual , 2003 .
[4] James S. Plank. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems , 1997 .
[5] Jack J. Dongarra,et al. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.
[6] Thomas Hérault,et al. MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[7] J. Plank. Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications , 2005 .
[8] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[9] Georg Stellner,et al. CoCheck: checkpointing and process migration for MPI , 1996, Proceedings of International Conference on Parallel Processing.
[10] Zhang Yu,et al. The Performance of Erasure Codes Used in FT-MPI , 2009, 2009 International Forum on Information Technology and Applications.
[11] Mario Blaum. A Family of MDS Array Codes with Minimal Number of Encoding Operations , 2006, 2006 IEEE International Symposium on Information Theory.
[12] C. Colbourn,et al. Handbook of Combinatorial Designs , 2006 .
[13] James S. Plank. The RAID-6 Liberation Codes , 2008, FAST.
[14] Jack Dongarra,et al. Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.
[15] Roy Friedman,et al. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[16] Peter F. Corbett,et al. Row-Diagonal Parity for Double Disk Failure Correction (Awarded Best Paper!) , 2004, USENIX Conference on File and Storage Technologies.
[17] Peter F. Corbett,et al. Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction , 2004 .
[18] George Bosilca,et al. Fault tolerant high performance computing by a coding approach , 2005, PPoPP.