LogSed: Anomaly Diagnosis through Mining Time-Weighted Control Flow Graph in Logs

Detecting execution anomalies is very important to monitoring and maintenance of cloud systems. People often use execution logs for troubleshooting and problem diagnosis, which is time consuming and error-prone. There is great demand for automatic anomaly detection based on logs. In this paper, we mine a time-weighted control flow graph (TCFG) that captures healthy execution flows of each component in cloud, and automatically raise anomaly alerts on observing deviations from TCFG. We outlined three challenges that are solved in this paper, including how to deal with the interleaving of multiple threads in logs, how to identify operational logs that do not contain any transactional information, and how to split the border of each transaction flow in the TCFG. We evaluate the effectiveness of our approach by leveraging logs from an IBM public cloud production platform and two simulated systems in the lab environment. The evaluation results show that our TCFG mining and anomaly diagnosis both perform over 80% precision and recall on average.

[1]  Wil M.P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[2]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Xiao Yu,et al.  CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs , 2016, ASPLOS.

[5]  Timo Hämäläinen,et al.  An Efficient Network Log Anomaly Detection System Using Random Projection Dimensionality Reduction , 2014, 2014 6th International Conference on New Technologies, Mobility and Security (NTMS).

[6]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[7]  Qiang Fu,et al.  Mining program workflow from interleaved traces , 2010, KDD.

[8]  Qiang Fu,et al.  Mining dependency in distributed systems through unstructured logs analysis , 2010, OPSR.

[9]  Jian Cao,et al.  Behavioral anomaly detection approach based on log monitoring , 2015, 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC).

[10]  Felix Salfner,et al.  Error Log Processing for Accurate Failure Prediction , 2008, WASL.

[11]  Gargi Dasgupta,et al.  Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs , 2016, KDD.

[12]  Yu Luo,et al.  lprof: A Non-intrusive Request Flow Profiler for Distributed Systems , 2014, OSDI.

[13]  Leonardo Mariani,et al.  AVA: automated interpretation of dynamically detected anomalies , 2009, ISSTA.

[14]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[15]  Mateusz Bilski,et al.  Migration from blocking to non-blocking web frameworks , 2014 .

[16]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[17]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  Arshad Jhumka,et al.  Linking Resource Usage Anomalies with System Failures from Cluster Log Data , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[19]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[20]  Ning Cao,et al.  System anomaly detection in distributed systems through MapReduce-Based log analysis , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[21]  Yuriy Brun,et al.  Mining temporal invariants from partially ordered logs , 2011, ACM SIGOPS Oper. Syst. Rev..

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[24]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[25]  Jing Chen,et al.  An improved deep log analysis method based on data reconstruction , 2014, 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems.

[26]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[27]  Xiaohui Gu,et al.  ELT: Efficient Log-based Troubleshooting System for Cloud Computing Infrastructures , 2011, 2011 IEEE 30th International Symposium on Reliable Distributed Systems.

[28]  Lin Yang,et al.  LOGAN: Problem Diagnosis in the Cloud Using Log-Based Reference Models , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).