Automatic Repair of Same-Timestamp Errors in Business Process Event Logs

This paper contributes an approach for automatically correcting “same-timestamp” errors in business process event logs. These errors consist in multiple events exhibiting the same timestamp within a given process instance. Such errors are common in practice and can be due to the logging granularity or the performance load of the logging system. Analyzing logs that have not been properly screened for such problems is likely to lead to wrong or misleading process insights. The proposed approach revolves around two techniques: one to reorder events with same-timestamp errors, the other to assign an estimated timestamp to each such event. The approach has been implemented in a software prototype and extensively evaluated in different settings, using both artificial and real-life logs. The experiments show that the approach significantly reduces the number of inaccurate timestamps, while the reordering of events scales well to large and complex datasets. The evaluation is complemented by a case study in the meat & livestock domain showing the usefulness of the approach in practice.

[1]  R. A. Zemlin,et al.  Integer Programming Formulation of Traveling Salesman Problems , 1960, JACM.

[2]  Jianmin Wang,et al.  Cleaning structured event logs: A graph repair approach , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[3]  Wil M. P. van der Aalst,et al.  Improving Process Discovery Results by Filtering Outliers Using Conditional Behavioural Probabilities , 2017, Business Process Management Workshops.

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Wil M. P. van der Aalst,et al.  Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions , 2012, ProHealth/KR4HC.

[6]  Arthur H. M. ter Hofstede,et al.  Filtering Out Infrequent Behavior from Business Process Event Logs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Marcello La Rosa,et al.  Detection and removal of infrequent behavior from event streams of business processes , 2020, Inf. Syst..

[8]  A. H. M. T. Hofstedea,et al.  Event log imperfection patterns for process mining : Towards a systematic approach to cleaning event logs , 2016 .

[9]  Moshe Lewenstein,et al.  Approximating asymmetric maximum TSP , 2003, SODA '03.

[10]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[11]  Wil M. P. van der Aalst,et al.  Repairing Outlier Behaviour in Event Logs using Contextual Behaviour , 2019, Enterp. Model. Inf. Syst. Archit. Int. J. Concept. Model..

[12]  Mathias Weske,et al.  Improving Documentation by Repairing Event Logs , 2013, PoEM.

[13]  Jianmin Wang,et al.  Cleaning timestamps with temporal constraints , 2016, The VLDB Journal.

[14]  Wil M. P. van der Aalst,et al.  Wanna improve process mining results? , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[15]  Marlon Dumas,et al.  Split miner: automated discovery of accurate and simple business process models from event logs , 2019, Knowledge and Information Systems.

[16]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[17]  Moe Thandar Wynn,et al.  Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs , 2017, Inf. Syst..

[18]  Luigi Pontieri,et al.  Outlier Detection Techniques for Process Mining Applications , 2008, ISMIS.

[19]  Marcello La Rosa,et al.  Filtering Spurious Events from Event Streams of Business Processes , 2018, CAiSE.

[20]  Boudewijn F. van Dongen,et al.  Conformance Checking Using Cost-Based Fitness Analysis , 2011, 2011 IEEE 15th International Enterprise Distributed Object Computing Conference.

[21]  Wil M. P. van der Aalst,et al.  Data Science in Action , 2016 .