Establishing Hypothesis for Recurrent System Failures from Cluster Log Files
暂无分享,去创建一个
Edward Chuah | Tommy Minyard | James C. Browne | John Hammond | William-Chandra Tjhi | Gary Kee Khoon Lee | Terence Hung | Shyh-Hao Kuo
[1] Saharon Rosset,et al. Analyzing system logs: a new view of what's important , 2007 .
[2] Jon Stearley,et al. What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[3] Rajeev Thakur,et al. A Fault Diagnosis and Prognosis Service for TeraGrid Clusters , 2007 .
[4] Mohamed Kaâniche,et al. Availability assessment of SunOS/Solaris Unix systems based on syslogd and wtmpx log files: A case study , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).
[5] Hui Xiong,et al. Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.
[6] Zhiling Lan,et al. A practical failure prediction with location and lead time for Blue Gene/P , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).
[7] Eric A. Brewer,et al. Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.
[8] Rajeev Thakur,et al. A study of dynamic meta-learning for failure prediction in large-scale systems , 2010, J. Parallel Distributed Comput..
[9] Tommy Minyard,et al. End-to-end framework for fault management for open source clusters: Ranger , 2010, TG.
[10] David A. Patterson,et al. Path-Based Failure and Evolution Management , 2004, NSDI.
[11] Cheng-Zhong Xu,et al. Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[12] Michal Aharon,et al. One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs , 2009, ECML/PKDD.
[13] Stephen E. Hansen,et al. Automated System Monitoring and Notification with Swatch , 1993, LISA.
[14] Christopher D. Carothers,et al. An analysis of clustered failures on large supercomputing systems , 2009, J. Parallel Distributed Comput..
[15] John Stearley,et al. Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[16] Jon Stearley,et al. Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[17] Ravishankar K. Iyer,et al. Recognition of Error Symptoms in Large Systems , 1986, FJCC.
[18] Rajeev Gandhi,et al. Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.
[19] Daniel P. Siewiorek,et al. Models for time coalescence in event logs , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.
[20] Daniel P. Siewiorek,et al. Error log analysis: statistical modeling and heuristic trend analysis , 1990 .
[21] Anand Sivasubramaniam,et al. BlueGene/L Failure Analysis and Prediction Models , 2006, International Conference on Dependable Systems and Networks (DSN'06).
[22] Zhiling Lan,et al. System log pre-processing to improve failure prediction , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.
[23] Rajeev Gandhi,et al. Kahuna: Problem diagnosis for Mapreduce-based cloud computing environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.
[24] Zhiling Lan,et al. Toward Automated Anomaly Identification in Large-Scale Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.
[25] Bianca Schroeder,et al. A Large-Scale Study of Failures in High-Performance Computing Systems , 2006, IEEE Transactions on Dependable and Secure Computing.
[26] Alexander Aiken,et al. Using correlated surprise to infer shared influence , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[27] Edward Chuah,et al. Diagnosing the root-causes of failures from cluster log files , 2010, 2010 International Conference on High Performance Computing.