Execution anomaly detection in large-scale systems through console log analysis

Abstract Execution anomaly detection is important for development, maintenance and performance tuning in large-scale systems. System console logs are the significant source of troubleshooting and problem diagnosis. However, manually inspecting logs to detect anomalies is unfeasible due to the increasing volume and complexity of log files. Therefore, this is a substantial demand for automatic anomaly detection based on log analysis. In this paper, we propose a general method to mine console logs to detect system problems. We first give some formal definitions of the problem, and then extract the set of log statements in the source code and generate the reachability graph to reveal the reachable relations of log statements. After that, we parse the log files to create log messages by combining information about log statements with information retrieval techniques. These messages are grouped into execution traces according to their execution units. We propose a novel anomaly detection algorithm that considers traces as sequence data and uses a probabilistic suffix tree based method to organize and differentiate significant statistical properties possessed by the sequences. Experiments on a CloudStack testbed and a Hadoop production system show that our method can effectively detect running anomalies in comparison with existing four detection algorithms.

[1]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[2]  So Young Sohn,et al.  Random effects logistic regression model for anomaly detection , 2007, Expert Syst. Appl..

[3]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[4]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[5]  Qiang Fu,et al.  Mining program workflow from interleaved traces , 2010, KDD.

[6]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[7]  Ying Li,et al.  An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for Distributed Services , 2017, 2017 IEEE International Conference on Web Services (ICWS).

[8]  Amutha Prabakar Muniyandi,et al.  Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm , 2012 .

[9]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[10]  Yun Wang,et al.  A multinomial logistic regression modeling approach for anomaly intrusion detection , 2005, Comput. Secur..

[11]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[12]  Yongdae Kim,et al.  A machine learning framework for network anomaly detection using SVM and GA , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[13]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[15]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[18]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[19]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[20]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[21]  Zied Elouedi,et al.  Naive Bayes vs decision trees in intrusion detection systems , 2004, SAC '04.

[22]  Rajeev Gandhi,et al.  Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop , 2009, HotCloud.

[23]  Xiao Yu,et al.  CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs , 2016, ASPLOS.

[24]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[25]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[26]  Edward Chuah,et al.  Diagnosing the root-causes of failures from cluster log files , 2010, 2010 International Conference on High Performance Computing.

[27]  Lin Yang,et al.  LOGAN: Problem Diagnosis in the Cloud Using Log-Based Reference Models , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[28]  Jiong Yang,et al.  CLUSEQ: efficient and effective sequence clustering , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[29]  Wei Xu,et al.  Improving one-class SVM for anomaly detection , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[30]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[31]  Fei Wu,et al.  Structural Event Detection from Log Messages , 2017, KDD.

[32]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[33]  Yasser Yasami,et al.  A novel unsupervised classification approach for network anomaly detection by k-Means clustering and ID3 decision tree learning methods , 2010, The Journal of Supercomputing.

[34]  John P. Rouillard Real-time Log File Analysis Using the Simple Event Correlator (SEC) , 2004, LISA.

[35]  Leonardo Mariani,et al.  Automated Identification of Failure Causes in System Logs , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[36]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[37]  Dewan Md. Farid,et al.  Combining Naive Bayes and Decision Tree for Adaptive Intrusion Detection , 2010, ArXiv.

[38]  Kenji Yamanishi,et al.  Dynamic syslog mining for network failure monitoring , 2005, KDD '05.

[39]  Ke Zhang,et al.  2016 Ieee International Conference on Big Data (big Data) Automated It System Failure Prediction: a Deep Learning Approach , 2022 .

[40]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[41]  Ying Li,et al.  LogSed: Anomaly Diagnosis through Mining Time-Weighted Control Flow Graph in Logs , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[42]  Hui Xiong,et al.  Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.

[43]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[44]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[45]  Domenico Cotroneo,et al.  Investigation of failure causes in workload-driven reliability testing , 2007, SOQUA '07.

[46]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[47]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[48]  Xiao Zhang,et al.  DriftInsight: Detecting Anomalous Behaviors in Large-Scale Cloud Platform , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[49]  Peter Zoeteweij,et al.  Automatic software fault localization using generic program invariants , 2008, SAC '08.

[50]  Yuanyuan Zhou,et al.  Understanding Customer Problem Troubleshooting from Storage System Logs , 2009, FAST.

[51]  Ling Huang,et al.  Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.

[52]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[53]  Ravishankar K. Iyer,et al.  Recognition of Error Symptoms in Large Systems , 1986, FJCC.

[54]  David Lo,et al.  Automatic steering of behavioral model inference , 2009, ESEC/SIGSOFT FSE.

[55]  Manish Kumar,et al.  PerfAugur: Robust diagnostics for performance anomalies in cloud services , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[56]  Kien A. Hua,et al.  Decision tree classifier for network intrusion detection with GA-based feature selection , 2005, ACM Southeast Regional Conference.

[57]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[58]  James E. Prewett Analyzing cluster log files using Logsurfer , 2003 .

[59]  Gilbert Hamann,et al.  An automated approach for abstracting execution logs to execution events , 2008 .

[60]  Yu Luo,et al.  lprof: A Non-intrusive Request Flow Profiler for Distributed Systems , 2014, OSDI.

[61]  Leonardo Mariani,et al.  AVA: automated interpretation of dynamically detected anomalies , 2009, ISSTA.

[62]  Qiang Fu,et al.  Contextual analysis of program logs for understanding system behaviors , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[63]  Wei Xu,et al.  Advances and challenges in log analysis , 2011, Commun. ACM.

[64]  Michael I. Jordan,et al.  Failure diagnosis using decision trees , 2004 .

[65]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[66]  Sarita V. Adve,et al.  Using likely program invariants to detect hardware errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[67]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[68]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[69]  Patrick Martin,et al.  Assisting developers of Big Data Analytics Applications when deploying on Hadoop clouds , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[70]  Ingo Weber,et al.  Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[71]  Domenico Cotroneo,et al.  Event Logs for the Analysis of Software Failures: A Rule-Based Approach , 2013, IEEE Transactions on Software Engineering.

[72]  Neil Walkinshaw,et al.  Inferring Finite-State Models with Temporal Constraints , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[73]  Jacques Wainer,et al.  Algorithms for anomaly detection of traces in logs of process aware information systems , 2013, Inf. Syst..

[74]  Dongmei Zhang,et al.  iDice: Problem Identification for Emerging Issues , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[75]  Stephen E. Hansen,et al.  Automated System Monitoring and Notification with Swatch , 1993, LISA.

[76]  Guofei Gu,et al.  Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[77]  Leonardo Mariani,et al.  Automatic generation of software behavioral models , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.