DLME: Distributed Log Mining Using Ensemble Learning for Fault Prediction

Fault prediction problems in network systems are often manifested as very onerous for better network management. One of the effective measures is to constantly monitor and analyze the unceasing generation of network logs that capture the activities of a network. The learning algorithms are quite useful for this purpose. However, due to the dynamic nature of network systems, a frequent drift in the logged data may occur which in turn affects the efficiency of the learning algorithms. In this paper, we present a general purpose algorithmic framework for developing easily parallelizable distributed log mining approach, which uses machine learning and distributed processing to achieve a better quality of network services. Our proposed approach monotonously handles the dynamic nature of network logs by tracking the changes in the distribution of logs and takes adequate actions according to that. The entire problem is illustrated as a distributed learning environment, where the complete set of logs is partitioned into assorted data chunks and a distributed weighted ensemble of the information is generated from these chunks. Furthermore, our method is tested on real dataset and experimental analysis shows that a fair amount of scalability and accuracy can be obtained.

[1]  William K. Robertson,et al.  Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks , 2013, ACSAC.

[2]  Sijjad Ali Khuhro,et al.  Intelligent System for Data Tracking in Image Editing Company , 2017 .

[3]  Jean Paul Barddal,et al.  A survey on feature drift adaptation: Definition, benchmark, challenges and future directions , 2017, J. Syst. Softw..

[4]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[5]  Shenglin Zhang,et al.  Device-Agnostic Log Anomaly Classification with Partial Labels , 2018, 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS).

[6]  Evangelos E. Milios,et al.  A Lightweight Algorithm for Message Type Extraction in System Application Logs , 2012, IEEE Transactions on Knowledge and Data Engineering.

[7]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[8]  Feifei Li,et al.  Spell: Online Streaming Parsing of Large Unstructured System Logs , 2019, IEEE Transactions on Knowledge and Data Engineering.

[9]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[10]  Reynold Xin,et al.  Apache Spark , 2016 .

[11]  Cha Zhang,et al.  Ensemble Machine Learning: Methods and Applications , 2012 .

[12]  Indre Zliobaite,et al.  How good is the Electricity benchmark for evaluating concept drift adaptation , 2013, ArXiv.

[13]  Jason Alexander,et al.  MultiLog: a tool for the control and output merging of multiple logging applications , 2016, Behavior research methods.

[14]  Andreas Mauthe,et al.  Traffic anomaly diagnosis in Internet backbone networks: A survey , 2014, Comput. Networks.

[15]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[16]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[17]  Rajiv Ramaswami,et al.  Automatic fault detection, isolation, and recovery in transparent all-optical networks , 1997 .

[18]  Cheng-Hao Tsai,et al.  Large-scale logistic regression and linear support vector machines using spark , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[19]  Risto Vaarandi,et al.  LogCluster - A data clustering and pattern mining algorithm for event logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[20]  Evangelos E. Milios,et al.  Investigating event log analysis with minimum apriori information , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[21]  Akio Watanabe,et al.  Proactive failure detection learning generation patterns of large-scale network logs , 2015, 2015 11th International Conference on Network and Service Management (CNSM).

[22]  Sun Yat,et al.  A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method , 2011 .

[23]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[25]  Hai Jin,et al.  UiLog: Improving Log-Based Fault Diagnosis by Log Analysis , 2016, Journal of Computer Science and Technology.

[26]  Wolfgang Kellerer,et al.  Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking , 2016 .

[27]  Kenji Yamanishi,et al.  Dynamic syslog mining for network failure monitoring , 2005, KDD '05.

[28]  G. Cybenko,et al.  Temporal and spatial distributed event correlation for network security , 2004, Proceedings of the 2004 American Control Conference.

[29]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[30]  Vaarandi Risto,et al.  Event log analysis with the LogCluster tool , 2016 .

[31]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[32]  Navjot Singh,et al.  A log mining approach to failure analysis of enterprise telephony systems , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[33]  C. S. Hood,et al.  Proactive network-fault detection [telecommunications] , 1997 .

[34]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[35]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[36]  Chen Yang,et al.  Anomaly network traffic detection algorithm based on information entropy measurement under the cloud computing environment , 2018, Cluster Computing.

[37]  Feifei Li,et al.  Spell: Streaming Parsing of System Event Logs , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[38]  Luiz Eduardo Soares de Oliveira,et al.  Adapting dynamic classifier selection for concept drift , 2018, Expert Syst. Appl..