A New Diskless Checkpointing Approach for Multiple Processor Failures
暂无分享,去创建一个
[1] Kai Li,et al. Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.
[2] Lihao Xu,et al. An efficient XOR-scheduling algorithm for erasure codes encoding , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.
[3] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[4] Luís Moura Silva,et al. Using two-level stable storge for efficient checkpointing , 1998, IEE Proc. Softw..
[5] Kai Li,et al. Faster checkpointing with N+1 parity , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.
[6] David F. Heidel,et al. An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[7] John Zahorjan,et al. The challenges of mobile computing , 1994, Computer.
[8] Tong-Ying Tony Juang,et al. An Efficient Asynchronous Recovery Algorithm In Wireless Mobile Ad Hoc Networks , 2002 .
[9] Ge-Ming Chiu,et al. Hardware-supported asynchronous checkpointing scheme , 1998 .
[10] WangYi-Min. Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints , 1997 .
[11] Achour Mostéfaoui,et al. Preventing useless checkpoints in distributed computations , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.
[12] Nitin H. Vaidya,et al. A Case for Two-Level Recovery Schemes , 1998, IEEE Trans. Computers.
[13] Tzi-cker Chiueh,et al. Evaluation of checkpoint mechanisms for massively parallel machines , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.
[14] Stuart I. Feldman,et al. IGOR: a system for program debugging via reversible execution , 1988, PADD '88.
[15] Yookun Cho,et al. Adaptive Mobile Checkpointing Facility for Wireless Sensor Networks , 2006, ICCSA.
[16] Zizhong Chen,et al. A Scalable Checkpoint Encoding Algorithm for Diskless Checkpointing , 2008, 2008 11th IEEE High Assurance Systems Engineering Symposium.
[17] J. Plank. A New MDS Erasure Code for RAID-6 , 2007 .
[18] William E. Johnston,et al. Coding for High Availability of a Distributed-Parallel Storage System , 1998, IEEE Trans. Parallel Distributed Syst..
[19] Kai Li,et al. Diskless Checkpointing , 1998, IEEE Trans. Parallel Distributed Syst..
[20] Jack J. Dongarra,et al. Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing , 1997, J. Parallel Distributed Comput..
[21] Yin-Min Wang,et al. Consistent Global checkpoints that Contain a Given Set of Local Chekpoints , 1997, IEEE Trans. Computers.
[22] Richard D. Schlichting,et al. Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.
[23] Willy Zwaenepoel,et al. The performance of consistent checkpointing , 1992, [1992] Proceedings 11th Symposium on Reliable Distributed Systems.
[24] Jack J. Dongarra,et al. Algorithm-based diskless checkpointing for fault tolerant matrix operations , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.
[25] W. Kent Fuchs,et al. CATCH-compiler-assisted techniques for checkpointing , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.
[26] Luís Moura Silva,et al. An experimental study about diskless checkpointing , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).
[27] James S. Plank,et al. Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems , 2001, J. Parallel Distributed Comput..
[28] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[29] James S. Plank,et al. A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..
[30] Sy-Yen Kuo,et al. More Properties of Communication-Induced Checkpointing Protocols with Rollback-Dependency Trackability , 2005, J. Inf. Sci. Eng..
[31] Ge-Ming Chiu,et al. Placing forced checkpoints in distributed real-time embedded systems , 2002 .
[32] George Bosilca,et al. Fault tolerant high performance computing by a coding approach , 2005, PPoPP.
[33] Zizhong Chen,et al. Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing , 2009, IEEE Transactions on Computers.
[34] Wei-Hua Hao,et al. Mutual-Aid: Diskless Checkpointing Scheme for Tolerating Double Faults , 2008, 2008 10th IEEE International Conference on High Performance Computing and Communications.
[35] Ge-Ming Chiu,et al. Efficient Rollback-Recovery Technique in Distributed Computing Systems , 1996, IEEE Trans. Parallel Distributed Syst..
[36] Sy-Yen Kuo,et al. Adaptive Communication-Induced Checkpointing Protocols with Domino-Effect Freedom , 2004, J. Inf. Sci. Eng..
[37] L. Alvisi,et al. A Survey of Rollback-Recovery Protocols , 2002 .