Detecting anomalies in business process event logs using statistical leverage

Abstract The presence of anomalous information in a business process event log, such as missing, duplicated or swapped events, hampers the possibility of extracting useful insights from event log analysis. A number of approaches exist in the literature to detect anomalous cases in event logs based on different paradigms, such as probabilistic, distance-based or reconstruction-based anomaly detection. This paper proposes a novel method for anomaly detection in event logs based on the information-theoretic paradigm, which has not been considered before in event log anomaly detection. In particular, we propose an anomaly score for cases of a process based on statistical leverage and three different methods to set the anomaly detection threshold. The proposed approach does not require large data sets to train machine learning models, which are necessary for instance in reconstruction-based approaches. The proposed approach shows remarkable anomaly detection capability in experiments conducted using publicly available event logs in respect of existing methods in the literature. One of the proposed anomaly detection thresholds also shows to handle variable case anomaly ratios more effectively than other methods in the literature.

[1]  Zhenyu Liu,et al.  A method of SVM with Normalization in Intrusion Detection , 2011 .

[2]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[3]  Marco Comuzzi,et al.  Autoencoders for improving quality of process event logs , 2019, Expert Syst. Appl..

[4]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[5]  Jacques Wainer,et al.  Algorithms for anomaly detection of traces in logs of process aware information systems , 2013, Inf. Syst..

[6]  Zixiang Xiong,et al.  Optimal number of features as a function of sample size for various classification rules , 2005, Bioinform..

[7]  Chang-Tien Lu,et al.  Outlier Detection , 2008, Encyclopedia of GIS.

[8]  Max Mühlhäuser,et al.  Analyzing business process anomalies using autoencoders , 2018, Machine Learning.

[9]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[10]  Stefanie Rinderle-Ma,et al.  Multi-perspective Anomaly Detection in Business Process Execution Events , 2016, OTM Conferences.

[11]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[12]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[13]  Claudia Diamantini,et al.  Discovering anomalous frequent patterns from partially ordered event logs , 2018, Journal of Intelligent Information Systems.

[14]  Max Mühlhäuser,et al.  BINet: Multi-perspective Business Process Anomaly Classification , 2019, Inf. Syst..

[15]  Luigi Pontieri,et al.  Outlier Detection Techniques for Process Mining Applications , 2008, ISMIS.

[16]  Debashree Devi,et al.  Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique , 2019, Connect. Sci..

[17]  E. Paul Zehr,et al.  A sigmoid function is the best fit for the ascending limb of the Hoffmann reflex recruitment curve , 2008, Experimental Brain Research.

[18]  Søren Debois,et al.  Entropy as a Measure of Log Variability , 2019, Journal on Data Semantics.

[19]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[20]  Wil M. P. van der Aalst,et al.  Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions , 2012, ProHealth/KR4HC.

[21]  N. Kumar,et al.  Testing for Upper Outliers in Gamma Sample , 2012 .

[22]  Sander J. J. Leemans,et al.  Scalable process discovery and conformance checking , 2016, Software & Systems Modeling.