Selection of a Checkpoint Interval in Coordinated Checkpointing Protocol for Fault Tolerant Open MPI
暂无分享,去创建一个
[1] K. Mani Chandy,et al. Analytic models for rollback and recovery strategies in data base systems , 1975, IEEE Transactions on Software Engineering.
[2] Luís Moura Silva,et al. The performance of coordinated and independent checkpointing , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[3] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[4] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[5] K. Mani Chandy,et al. A Survey of Analytic Models of Rollback and Recovery Stratergies , 1975, Computer.
[6] James S. Plank,et al. The average availability of parallel checkpointing systems and its importance in selecting runtime parameters , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).
[7] Michael Treaster,et al. A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems , 2004, ArXiv.
[8] Jack J. Dongarra,et al. HARNESS and fault tolerant MPI , 2001, Parallel Comput..
[9] Robert Geist,et al. Selection of a checkpoint interval in a critical-task environment , 1988 .
[10] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[11] Stephen L. Scott,et al. An optimal checkpoint/restart model for a large scale high performance computing system , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[12] John Daly. A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps , 2003, International Conference on Computational Science.
[13] Yudan Liu. Reliability -aware optimal checkpoint /restart model in high performance computing , 2007 .
[14] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.