Online System Problem Detection by Mining Patterns of Console Logs

We describe a novel application of using data mining and statistical learning methods to automatically monitor and detect abnormal execution traces from console logs in an online setting. Different from existing solutions, we use a two stage detection system. The first stage uses frequent pattern mining and distribution estimation techniques to capture the dominant patterns (both frequent sequences and time duration). The second stage use principal component analysis based anomaly detection technique to identify actual problems. Using real system data from a 203-node Hadoop [1] cluster, we show that we can not only achieve highly accurate and fast problem detection, but also help operators better understand execution patterns in their system.

[1]  D. Brillinger,et al.  Handbook of methods of applied statistics , 1967 .

[2]  J. E. Jackson,et al.  Control Procedures for Residuals Associated With Principal Component Analysis , 1979 .

[3]  Stephen E. Hansen,et al.  Automated System Monitoring and Notification with Swatch , 1993, LISA.

[4]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[5]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[6]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Joseph L. Hellerstein,et al.  Discovering actionable patterns in event data , 2002, IBM Syst. J..

[9]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[11]  James E. Prewett Analyzing cluster log files using Logsurfer , 2003 .

[12]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[13]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[14]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[15]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[16]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[17]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[18]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[19]  Kuai Xu,et al.  Profiling Internet BackboneTraffic : Behavior Models and Applications , 2005 .

[20]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[21]  Kenji Yamanishi,et al.  Dynamic syslog mining for network failure monitoring , 2005, KDD '05.

[22]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[23]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML '06.

[24]  Sören Sonnenburg,et al.  Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees , 2006, NIPS.

[25]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[26]  Michalis Faloutsos,et al.  Profiling the End Host , 2007, PAM.

[27]  Yanfang Ye,et al.  IMDS: intelligent malware detection system , 2007, KDD '07.

[28]  Li Xiong,et al.  Frequent pattern mining for kernel trace data , 2008, SAC '08.

[29]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[30]  Ling Huang,et al.  Large-Scale System Problems Detection by Mining Console Logs , 2009 .

[31]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[32]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.