The impact of biased sampling of event logs on the performance of process discovery

With Process discovery algorithms, we discover process models based on event data, captured during the execution of business processes. The process discovery algorithms tend to use the whole event data. When dealing with large event data, it is no longer feasible to use standard hardware in a limited time. A straightforward approach to overcome this problem is to down-size the data utilizing a random sampling method. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper systematically evaluates various biased sampling methods and evaluates their performance on different datasets using four different discovery techniques. Our experiments show that it is possible to considerably speed up discovery techniques using biased sampling without losing the resulting process model quality. Furthermore, due to the implicit filtering (removing outliers) obtained by applying the sampling technique, the model quality may even be improved.

[1]  Boudewijn F. van Dongen,et al.  Avoiding Over-Fitting in ILP-Based Process Discovery , 2015, BPM.

[2]  Wil M. P. van der Aalst,et al.  Repairing Outlier Behaviour in Event Logs using Contextual Behaviour , 2019, Enterp. Model. Inf. Syst. Archit. Int. J. Concept. Model..

[3]  Marlon Dumas,et al.  Split miner: automated discovery of accurate and simple business process models from event logs , 2019, Knowledge and Information Systems.

[4]  Arthur H. M. ter Hofstede,et al.  Filtering Out Infrequent Behavior from Business Process Event Logs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[7]  Minseok Song,et al.  Predicting performances in business processes using deep neural networks , 2020, Decis. Support Syst..

[8]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[9]  Josep Carmona,et al.  Process Mining Meets Abstract Interpretation , 2010, ECML/PKDD.

[10]  Marco Pegoraro,et al.  Discovering Process Models from Uncertain Event Data , 2019, Business Process Management Workshops.

[11]  Wil M. P. van der Aalst,et al.  Enabling process mining on sensor data from smart products , 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS).

[12]  Boudewijn F. van Dongen,et al.  Discovering Relaxed Sound Workflow Nets using Integer Linear Programming , 2017, ArXiv.

[13]  Boudewijn F. van Dongen,et al.  Discovering workflow nets using integer linear programming , 2017, Computing.

[14]  Hiroki Horita,et al.  Extraction of Missing Tendency Using Decision Tree Learning in Business Process Event Log , 2020, Data.

[15]  Wil M. P. van der Aalst,et al.  Repairing Outlier Behaviour in Event Logs , 2018, BIS.

[16]  Selen Turkay,et al.  Collaborative and Interactive Detection and Repair of Activity Labels in Process Event Logs , 2020, 2020 2nd International Conference on Process Mining (ICPM).

[17]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[18]  Lars Grunske,et al.  How Much Event Data Is Enough? A Statistical Framework for Process Discovery , 2018, CAiSE.

[19]  Wil M. P. van der Aalst,et al.  Discovering more precise process models from event logs by filtering out chaotic activities , 2017, Journal of Intelligent Information Systems.

[20]  Wil M. P. van der Aalst,et al.  RapidProM: Mine Your Processes and Not Just Your Data , 2017, ArXiv.

[21]  Mohammadreza Fani Sani,et al.  Conformance Checking Approximation Using Subset Selection and Edit Distance , 2019, CAiSE.

[22]  Bart Baesens,et al.  A robust F-measure for evaluating discovered process models , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[23]  Wil M. P. van der Aalst,et al.  Applying Sequence Mining for Outlier Detection in Process Mining , 2018, OTM Conferences.

[24]  Wil M. P. van der Aalst,et al.  The Impact of Event Log Subset Selection on the Performance of Process Discovery Algorithms , 2019, ADBIS.

[25]  Adriano Augusto,et al.  Automatic Repair of Same-Timestamp Errors in Business Process Event Logs , 2020, BPM.

[26]  Wil M. P. van der Aalst,et al.  Supporting Automatic System Dynamics Model Generation for Simulation in the Context of Process Mining , 2020, BIS.

[27]  Boudewijn F. van Dongen,et al.  XES, XESame, and ProM 6 , 2010, CAiSE Forum.