Research on Optimum Checkpoint Interval for Hybrid Fault Tolerance
暂无分享,去创建一个
[1] John Daly. A Model for Predicting the Optimum Checkpoint Interval for Restart Dumps , 2003, International Conference on Computational Science.
[2] Miroslaw Malek,et al. A survey of online failure prediction methods , 2010, CSUR.
[3] Jack Dongarra,et al. Computational Science — ICCS 2003 , 2003, Lecture Notes in Computer Science.
[4] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[5] Bronis R. de Supinski,et al. Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[6] Rolf Riesen,et al. Fault-tolerance for exascale systems , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).
[7] Kishor S. Trivedi,et al. Performance Assurance via Software Rejuvenation: Monitoring, Statistics and Algorithms , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[8] Franck Cappello,et al. Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities , 2009, Int. J. High Perform. Comput. Appl..
[9] Nian-Feng Tzeng,et al. Adaptive Incremental Checkpointing via Delta Compression for Networked Multicore Systems , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[10] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[11] Chao Wang,et al. Proactive process-level live migration and back migration in HPC environments , 2012, J. Parallel Distributed Comput..
[12] Philip S. Yu,et al. Toward Predictive Failure Management for Distributed Stream Processing Systems , 2008, 2008 The 28th International Conference on Distributed Computing Systems.
[13] Emilio Luque,et al. What is Missing in Current Checkpoint Interval Models? , 2011, 2011 31st International Conference on Distributed Computing Systems.
[14] William Gropp,et al. Fault Tolerance in Message Passing Interface Programs , 2004, Int. J. High Perform. Comput. Appl..
[15] Rajeev Thakur,et al. A Meta-Learning Failure Predictor for Blue Gene/L Systems , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).