Autoencoders for improving quality of process event logs

Abstract Low quality of business process event logs, as determined by anomalous and missing values, is often unavoidable in practical contexts. The output of process analysis that uses event logs with missing and anomalous values is also likely to be of low quality, thus decreasing the quality of any decisions based on it. While previous work has focused on reconstructing missing events in an event log or removing anomalous traces, in this paper we focus on detecting anomalous values and reconstructing missing values at the level of attributes in event logs. We propose methods based on autoencoders, which are a class of neural networks that can reconstruct their own input and are particularly suitable to learn a model of the complex relationships among attribute values in an event log. These methods do not rely on any a-priori knowledge about the business process that generated an event log and are evaluated using real world and artificially-generated event logs. The paper also discusses a qualitative analysis of the impact of event log cleaning and reconstruction on the output of process discovery. The proposed approach shows remarkable performance regarding activity labels and timestamps in artificial event logs. The performance in the case of real world event logs, in particular timestamp anomaly detection, is lower, which may be due to high variability of attribute values in the chosen event logs. Process models discovered from reconstructed event logs are characterised by lower variability of allowed behaviour and, therefore, are more usable in practice.

[1]  R. Cook Detection of influential observation in linear regression , 2000 .

[2]  Ricardo Seguel,et al.  Process Mining Manifesto , 2011, Business Process Management Workshops.

[3]  Mohsen Attaran,et al.  Exploring the relationship between information technology and business process reengineering , 2004, Inf. Manag..

[4]  Ahmed Awad,et al.  Deducing Case IDs for Unlabeled Event Logs , 2015, Business Process Management Workshops.

[5]  Marlon Dumas,et al.  Predictive Business Process Monitoring with LSTM Neural Networks , 2016, CAiSE.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  H. Abdi,et al.  Principal component analysis , 2010 .

[8]  Jacques Wainer,et al.  Algorithms for anomaly detection of traces in logs of process aware information systems , 2013, Inf. Syst..

[9]  Wil M. P. van der Aalst,et al.  Wanna improve process mining results? , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[10]  Max Mühlhäuser,et al.  Analyzing business process anomalies using autoencoders , 2018, Machine Learning.

[11]  Luigi Pontieri,et al.  Outlier Detection Techniques for Process Mining Applications , 2008, ISMIS.

[12]  Moe Thandar Wynn,et al.  Detection and Interactive Repair of Event Ordering Imperfection in Process Logs , 2018, CAiSE.

[13]  Antonio Ruiz-Cortés,et al.  Predictive Monitoring of Business Processes: A Survey , 2018, IEEE Transactions on Services Computing.

[14]  Sander J. J. Leemans,et al.  PM ^2 : A Process Mining Project Methodology , 2015, CAiSE.

[15]  Pedro Rangel Henriques,et al.  A Formal Definition of Data Quality Problems , 2005, ICIQ.

[16]  Max Mühlhäuser,et al.  Unsupervised Anomaly Detection in Noisy Business Process Event Logs Using Denoising Autoencoders , 2016, DS.

[17]  Mathias Weske,et al.  Repairing Event Logs Using Timed Process Models , 2013, OTM Workshops.

[18]  Luciano Baresi,et al.  Multi-party business process compliance monitoring through IoT-enabled artifacts , 2018, Inf. Syst..

[19]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[20]  Wil M. P. van der Aalst,et al.  Process Mining and Security: Detecting Anomalous Process Executions and Checking Process Conformance , 2005, WISP@ICATPN.

[21]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[22]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[23]  Moe Thandar Wynn,et al.  Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs , 2017, Inf. Syst..

[24]  Akhil Kumar,et al.  Process mining on noisy logs - Can log sanitization help to improve performance? , 2015, Decis. Support Syst..

[25]  Claudia Diamantini,et al.  Discovering anomalous frequent patterns from partially ordered event logs , 2018, Journal of Intelligent Information Systems.

[26]  Wil M. P. van der Aalst,et al.  Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions , 2012, ProHealth/KR4HC.

[27]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[28]  Jianmin Wang,et al.  Efficient Recovery of Missing Events , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Nittaya Kerdprasop,et al.  An Empirical Study of Distance Metrics for k-Nearest Neighbor Algorithm , 2015 .

[30]  Stefanie Rinderle-Ma,et al.  Multi Instance Anomaly Detection in Business Process Executions , 2017, BPM.

[31]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[32]  et al.,et al.  Missing Data Imputation in the Electronic Health Record Using Deeply Learned Autoencoders , 2017, PSB.

[33]  Mathias Weske,et al.  Improving Documentation by Repairing Event Logs , 2013, PoEM.

[34]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[35]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.