An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for Distributed Services

Detecting runtime anomalies is very important to monitoring and maintenance of distributed services. People often use execution logs for troubleshooting and problem diagnosis manually, which is time consuming and error-prone. In this paper, we propose an approach for automatic anomaly detection based on logs. We first mine a hybrid graph model that captures normal execution flows inter and intra services, and then raise anomaly alerts on observing deviations from the hybrid model. We evaluate the effectiveness of our approach by leveraging logs from an IBM public cloud production platform and two simulated systems in the lab environment. Evaluation results show that our hybrid graph model mining performs over 80% precision and 70% recall and anomaly detection performs nearly 90% precision and 80% recall on average.

[1]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[2]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[3]  Qiang Fu,et al.  Mining program workflow from interleaved traces , 2010, KDD.

[4]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[5]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[6]  Pengfei Chen,et al.  CauseInfer: Automated End-to-End Performance Diagnosis with Hierarchical Causality Graph in Cloud Environment , 2019, IEEE Transactions on Services Computing.

[7]  J. Ghosh Causality: Models, Reasoning and Inference, Second Edition by Judea Pearl , 2011 .

[8]  Richard Mortier,et al.  Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.

[9]  Xiao Yu,et al.  CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs , 2016, ASPLOS.

[10]  Peter Bühlmann,et al.  Robustification of the PC-Algorithm for Directed Acyclic Graphs , 2008 .

[11]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[12]  Armando Fox,et al.  Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[13]  Ying Li,et al.  LogSed: Anomaly Diagnosis through Mining Time-Weighted Control Flow Graph in Logs , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[14]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[15]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[16]  Ling Huang,et al.  Online System Problem Detection by Mining Patterns of Console Logs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Yu Zhang,et al.  Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[18]  Richard Mortier,et al.  Constellation: automated discovery of service and host dependencies in networked systems , 2008 .

[19]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[20]  Qiang Fu,et al.  Mining dependency in distributed systems through unstructured logs analysis , 2010, OPSR.

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Lin Yang,et al.  LOGAN: Problem Diagnosis in the Cloud Using Log-Based Reference Models , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[23]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[24]  Xu Chen,et al.  Automating Network Application Dependency Discovery: Experiences, Limitations, and New Solutions , 2008, OSDI.

[25]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[26]  Gargi Dasgupta,et al.  Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs , 2016, KDD.

[27]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[28]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[29]  Yuriy Brun,et al.  Mining temporal invariants from partially ordered logs , 2011, OPSR.

[30]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[31]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.