A Meta-Learning Failure Predictor for Blue Gene/L Systems
暂无分享,去创建一个
Rajeev Thakur | Zhiling Lan | Yawei Li | John White | Prashasta Gujrati | R. Thakur | Z. Lan | Yawei Li | P. Gujrati | John White
[1] Anand Sivasubramaniam,et al. Filtering failure logs for a BlueGene/L prototype , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[2] Fan Zhang,et al. A statistical approach to predictive detection , 2001, Comput. Networks.
[3] Stewart W. Wilson,et al. Learning Classifier Systems, From Foundations to Applications , 2000 .
[4] Kenny C. Gross,et al. MSET Performance Optimization for Detection of Software Aging , 2003 .
[5] Greg Hamerly,et al. Bayesian approaches to failure prediction for disk drives , 2001, ICML.
[6] Ricardo Vilalta,et al. Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[7] Anand Sivasubramaniam,et al. Fault-aware job scheduling for BlueGene/L systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[8] Wednesday September,et al. 2007 International Conference on Parallel Processing , 2007 .
[9] Song Jiang,et al. Transparent, Incremental Checkpointing at Kernel Level: a Foundation for Fault Tolerance for Parallel Computers , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[10] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[11] Jon Stearley,et al. What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[12] Miroslaw Malek,et al. Advanced Failure Prediction in Complex Software Systems , 2004 .
[13] George L.-T. Chiu,et al. Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..
[14] Armando Fox,et al. Three Research Challenges at the Intersection of Machine Learning, Statistical Induction, and Systems , 2005, HotOS.
[15] Anand Sivasubramaniam,et al. Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.
[16] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.
[17] E. N. Elnozahy,et al. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery , 2004, IEEE Transactions on Dependable and Secure Computing.
[18] Zhiling Lan,et al. Exploit failure prediction for adaptive fault-tolerance in cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[19] Laxmikant V. Kale,et al. Proactive Fault Tolerance in Large Systems , 2004 .
[20] Ravishankar K. Iyer,et al. Recognition of Error Symptoms in Large Systems , 1986, FJCC.
[21] Douglas G. Turnbull. Failure Prediction in Hardware Systems , 2022 .
[22] Kishor S. Trivedi,et al. A measurement-based model for estimation of resource exhaustion in operational software systems , 1999, Proceedings 10th International Symposium on Software Reliability Engineering (Cat. No.PR00443).
[23] R. Polikar,et al. Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.
[24] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).
[25] Adolfy Hoisie,et al. Use of Predictive Performance Modeling during Large-scale System Installation , 2005, Parallel Process. Lett..
[26] Kishor S. Trivedi,et al. Probabilistic modeling of computer system availability , 1987 .
[27] Anand Sivasubramaniam,et al. BlueGene/L Failure Analysis and Prediction Models , 2006, International Conference on Dependable Systems and Networks (DSN'06).