UiLog: Improving Log-Based Fault Diagnosis by Log Analysis

In modern computer systems, system event logs have always been the primary source for checking system status. As computer systems become more and more complex, the interaction between software and hardware increases frequently. The components will generate enormous log information, including running reports and fault information. The sheer quantity of data is a great challenge for analysis relying on the manual method. In this paper, we implement a management and analysis system of log information, which can assist system administrators to understand the real-time status of the entire system, classify logs into different fault types, and determine the root cause of the faults. In addition, we improve the existing fault correlation analysis method based on the results of system log classification. We apply the system in a cloud computing environment for evaluation. The results show that our system can classify fault logs automatically and effectively. With the proposed system, administrators can easily detect the root cause of faults.

[1]  Yaping Zhu,et al.  Design of fault detection observer based on Hyper Basis Function , 2015 .

[2]  Domenico Cotroneo,et al.  Improving Log-based Field Failure Data Analysis of multi-node computing systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[3]  Mohamed Kaâniche,et al.  Event log based dependability analysis of Windows NT and 2K systems , 2002, 2002 Pacific Rim International Symposium on Dependable Computing, 2002. Proceedings..

[4]  Zhi-Li Zhang,et al.  Extracting the textual and temporal structure of supercomputing logs , 2009, 2009 International Conference on High Performance Computing (HiPC).

[5]  Wei Peng,et al.  An integrated framework on mining logs files for computing system management , 2005, KDD '05.

[6]  Daniel P. Siewiorek,et al.  Models for time coalescence in event logs , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[7]  Zhenbang Chen,et al.  Identifying faults in large-scale distributed systems by filtering noisy error logs , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W).

[8]  Archana Ganapathi,et al.  Crash data collection: a Windows case study , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[9]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[11]  Jon Stearley,et al.  Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[12]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[13]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[14]  Cheng-Zhong Xu,et al.  Exploring event correlation for failure prediction in coalitions of clusters , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[15]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[16]  Hideki Koike,et al.  Tudumi: information visualization system for monitoring and auditing computer logs , 2002, Proceedings Sixth International Conference on Information Visualisation.

[17]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[18]  Ling Huang,et al.  Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20]  Huaimin Wang,et al.  Localizing root causes of performance anomalies in cloud computing systems by analyzing request trace logs , 2012, Science China Information Sciences.

[21]  M. Tahar Kechadi,et al.  Cufres: clustering using fuzzy representative eventsselection for the fault recognition problem intelecommunication networks , 2007, PIKM '07.

[22]  Ragib Hasan,et al.  SecLaaS: secure logging-as-a-service for cloud forensics , 2013, ASIA CCS '13.

[23]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[24]  Felix Salfner,et al.  Error Log Processing for Accurate Failure Prediction , 2008, WASL.

[25]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[26]  Eunseok Lee,et al.  Proactive self-healing system based on multi-agent technologies , 2005, Third ACIS Int'l Conference on Software Engineering Research, Management and Applications (SERA'05).