暂无分享,去创建一个
[1] Laxmikant V. Kalé,et al. A scalable double in-memory checkpoint and restart scheme towards exascale , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012).
[2] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[3] Glenn A. Fink,et al. Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.
[4] Franck Cappello,et al. Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[5] Franck Cappello,et al. Preventive Migration vs. Preventive Checkpointing for Extreme Scale Supercomputers , 2011, Parallel Process. Lett..
[6] Zhiling Lan,et al. A practical failure prediction with location and lead time for Blue Gene/P , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).
[7] Zhiling Lan,et al. Practical online failure prediction for Blue Gene/P: Period-based vs event-driven , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W).
[8] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..
[9] James H. Laros,et al. Evaluating the viability of process replication reliability for exascale systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Hui Xiong,et al. Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.
[11] John T. Daly,et al. A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..
[12] John W. Young,et al. A first order approximation to the optimum checkpoint interval , 1974, CACM.
[13] Franck Cappello,et al. Modeling and tolerating heterogeneous failures in large parallel systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[14] Richard P. Martin,et al. Improving cluster availability using workstation validation , 2002, SIGMETRICS '02.