CRUDE: Combining Resource Usage Data and Error Logs for Accurate Error Detection in Large-Scale Distributed Systems
暂无分享,去创建一个
Arshad Jhumka | Edward Chuah | Nentawe Gurumdimma | Maria Liakata | James C. Browne | Maria Liakata | J. Browne | A. Jhumka | Nentawe Gurumdimma | Edward Chuah
[1] Evangelos E. Milios,et al. An Evaluation of Entropy Based Approaches to Alert Detection in High Performance Cluster Logs , 2010, 2010 Seventh International Conference on the Quantitative Evaluation of Systems.
[2] Arshad Jhumka,et al. Towards Detecting Patterns in Failure Logs of Large-Scale Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.
[3] Ling Huang,et al. Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.
[4] A. Nur Zincir-Heywood,et al. Fast entropy based alert detection in super computer logs , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).
[5] Mark Crovella,et al. Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.
[6] Carl E. Landwehr,et al. Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.
[7] Bianca Schroeder,et al. Reading between the lines of failure logs: Understanding how HPC systems fail , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[8] Zhiling Lan,et al. Exploring void search for fault detection on extreme scale systems , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).
[9] Zhiling Lan,et al. Toward Automated Anomaly Identification in Large-Scale Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.
[10] Jon Stearley,et al. What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).
[11] Michael I. Jordan,et al. Detecting large-scale system problems by mining console logs , 2009, SOSP '09.
[12] Evangelos E. Milios,et al. System State Discovery Via Information Content Clustering of System Logs , 2011, 2011 Sixth International Conference on Availability, Reliability and Security.
[13] Alexander Aiken,et al. Alert Detection in System Logs , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[14] Franck Cappello,et al. Fault prediction under the microscope: A closer look into HPC systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Zhenbang Chen,et al. Identifying faults in large-scale distributed systems by filtering noisy error logs , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W).
[16] Jianfeng Zhan,et al. LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems , 2010, 2012 IEEE 31st Symposium on Reliable Distributed Systems.
[17] Tommy Minyard,et al. End-to-end framework for fault management for open source clusters: Ranger , 2010, TG.
[18] Yuh-Jye Lee,et al. Anomaly Detection via Online Oversampling Principal Component Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.
[19] Arshad Jhumka,et al. Linking Resource Usage Anomalies with System Failures from Cluster Log Data , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.
[20] Glenn A. Fink,et al. Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.
[21] Franck Cappello,et al. Taming of the Shrew: Modeling the Normal and Faulty Behaviour of Large-scale HPC Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.