An alarm prediction framework for financial IT system using hybrid machine learning methods

Informatization grows rapidly in all walks of life, going with the enhancement of dependence on IT systems. It is of vital importance to ensure the safe and stable running of the system especially in the field of finance. This paper puts forward a machine learningbased framework for predicting the occurrence of the alarm cases of a financial IT system. We extracted the features from the system logs then build three sub modules which are time-series prediction module, alarm classification module and level division module that composing the whole work flow. We take multiple methods to deal with the problems facing the obstacles in each module. We built the time-series prediction model in terms of time and accuracy performance. To gain higher performance, we introduced ensemble learning methods in designing alarm classifier and alleviated the class-imbalance problem in alarm level division process. The evaluation results from all sides show that our framework could be exploited for real time applications with the veracity and reliability ensured. ∗Corresponding author: songyou@buaa.edu.cn

[1]  Dan Pei,et al.  Threshold compression for 3G scalable monitoring , 2012, 2012 Proceedings IEEE INFOCOM.

[2]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[3]  Glenn A. Fink,et al.  Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.

[4]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[5]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Ali A. Ghorbani,et al.  Network Anomaly Detection Based on Wavelet Analysis , 2009, EURASIP J. Adv. Signal Process..

[8]  Günther Palm,et al.  Classification of Time Series Utilizing Temporal and Decision Fusion , 2001, Multiple Classifier Systems.

[9]  Johannes Fürnkranz,et al.  Rule Stacking: An Approach for Compressing an Ensemble of Rule Sets into a Single Classifier , 2011, Discovery Science.

[10]  Shuai Hu,et al.  An anomaly detection model of user behavior based on similarity clustering , 2018, 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC).

[11]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[12]  Jian Cao,et al.  Behavioral anomaly detection approach based on log monitoring , 2015, 2015 International Conference on Behavioral, Economic and Socio-cultural Computing (BESC).

[13]  Matthieu Roy,et al.  Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[14]  Cristiana Amza,et al.  Stage-aware anomaly detection through tracking log points , 2014, Middleware.

[15]  Zbigniew T. Kalbarczyk,et al.  Enhancing Anomaly Diagnosis of Automatic Train Supervision System Based on Operation Log , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W).

[16]  Peter Filzmoser,et al.  Dynamic log file analysis: An unsupervised cluster evolution approach for anomaly detection , 2018, Comput. Secur..

[17]  Arun Kejariwal,et al.  On the Runtime-Efficacy Trade-off of Anomaly Detection Techniques for Real-Time Streaming Data , 2017, ArXiv.

[18]  Jakub Breier,et al.  Anomaly Detection from Log Files Using Data Mining Techniques , 2015 .

[19]  Hui Xiong,et al.  An Adaptive Semantic Filter for Blue Gene/L Failure Log Analysis , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[20]  Dimiter R. Avresky,et al.  A Machine Learning-Based Framework for Building Application Failure Prediction Models , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[21]  Shilin He,et al.  Experience Report: System Log Analysis for Anomaly Detection , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[22]  Timo Hämäläinen,et al.  An Efficient Network Log Anomaly Detection System Using Random Projection Dimensionality Reduction , 2014, 2014 6th International Conference on New Technologies, Mobility and Security (NTMS).

[23]  Jacques Wainer,et al.  Algorithms for anomaly detection of traces in logs of process aware information systems , 2013, Inf. Syst..

[24]  Claude Sammut,et al.  Classification of Multivariate Time Series and Structured Data Using Constructive Induction , 2005, Machine Learning.

[25]  John R. Reuning Applying Term Weight Techniques to Event Log Analysis for Intrusion Detection , 2004 .

[26]  Oliver Kramer,et al.  KNN Regression as Geo-Imputation Method for Spatio-Temporal Wind Data , 2014, SOCO-CISIS-ICEUTE.

[27]  Ruibin Zhang,et al.  Referential kNN Regression for Financial Time Series Forecasting , 2013, ICONIP.

[28]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[29]  Roi Naveiro,et al.  Large Scale Automated Forecasting for Monitoring Network Safety and Security , 2018 .

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Juan José Rodríguez Diez,et al.  Stacking for multivariate time series classification , 2015, Pattern Analysis and Applications.

[32]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[33]  Henrik Boström,et al.  Boosting interval based literals , 2001, Intell. Data Anal..

[34]  K. P. Soman,et al.  Long short-term memory based operation log anomaly detection , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[35]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[36]  Mohamed S. Kamel,et al.  A Modular System for the Classification of Time Series Data , 2004, Multiple Classifier Systems.

[37]  Zhuang Wang,et al.  Log-based predictive maintenance , 2014, KDD.

[38]  Stanley B. Zdonik,et al.  Greenhouse: A Zero-Positive Machine Learning System for Time-Series Anomaly Detection , 2018, ArXiv.

[39]  Jon Stearley,et al.  What Supercomputers Say: A Study of Five System Logs , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[40]  Hui Xiong,et al.  Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.

[41]  Miroslaw Malek,et al.  Comprehensive logfiles for autonomic systems , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[42]  Jon Stearley,et al.  Bad Words: Finding Faults in Spirit's Syslogs , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).