Split miner: automated discovery of accurate and simple business process models from event logs

The problem of automated discovery of process models from event logs has been intensively researched in the past two decades. Despite a rich field of proposals, state-of-the-art automated process discovery methods suffer from two recurrent deficiencies when applied to real-life logs: (i) they produce large and spaghetti-like models; and (ii) they produce models that either poorly fit the event log (low fitness) or over-generalize it (low precision). Striking a trade-off between these quality dimensions in a robust and scalable manner has proved elusive. This paper presents an automated process discovery method, namely Split Miner, which produces simple process models with low branching complexity and consistently high and balanced fitness and precision, while achieving considerably faster execution times than state-of-the-art methods, measured on a benchmark covering twelve real-life event logs. Split Miner combines a novel approach to filter the directly-follows graph induced by an event log, with an approach to identify combinations of split gateways that accurately capture the concurrency, conflict and causal relations between neighbors in the directly-follows graph. Split Miner is also the first automated process discovery method that is guaranteed to produce deadlock-free process models with concurrency, while not being restricted to producing block-structured process models.

[1]  Boudewijn F. van Dongen,et al.  On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery , 2012, OTM Conferences.

[2]  Mathias Weske,et al.  Maximal Structuring of Acyclic Process Models , 2011, Comput. J..

[3]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[4]  C. Humby,et al.  Process Mining: Data science in Action , 2014 .

[5]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[6]  Boudewijn F. van Dongen,et al.  Conformance Checking Using Cost-Based Fitness Analysis , 2011, 2011 IEEE 15th International Enterprise Distributed Object Computing Conference.

[7]  Jorge S. Cardoso,et al.  Business Process Control-Flow Complexity: Metric, Evaluation, and Validation , 2008, Int. J. Web Serv. Res..

[8]  Weiru Chen,et al.  Discovering exclusive patterns in frequent sequences , 2010, Int. J. Data Min. Model. Manag..

[9]  Moe Thandar Wynn,et al.  Soundness of workflow nets: classification, decidability, and analysis , 2011, Formal Aspects of Computing.

[10]  Thomas Molka,et al.  Evolutionary Computation Based Discovery of Hierarchical Business Process Models , 2015, BIS.

[11]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[12]  Marlon Dumas,et al.  Structuring acyclic process models , 2010, Inf. Syst..

[13]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[14]  Seppe K. L. M. vanden Broucke,et al.  Fodina: A robust and flexible heuristic process discovery technique , 2017, Decis. Support Syst..

[15]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[16]  Philip S. Yu,et al.  Discovering Frequent Closed Partial Orders from Strings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Arthur H. M. ter Hofstede,et al.  Filtering Out Infrequent Behavior from Business Process Event Logs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[18]  BaesensBart,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012 .

[19]  Marlon Dumas,et al.  Automated Discovery of Structured Process Models: Discover Structured vs. Discover and Structure , 2016, ER.

[20]  Hagen Völzer,et al.  A New Semantics for the Inclusive Converging Gateway in Safe Processes , 2010, BPM.

[21]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[22]  Wil M. P. van der Aalst,et al.  Process Mining , 2016, Springer Berlin Heidelberg.

[23]  Jan Mendling,et al.  Metrics for Process Models: Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness , 2008, Lecture Notes in Business Information Processing.

[24]  Jan Mendling Validation of Metrics as Error Predictors , 2008 .

[25]  Massimo Mecella,et al.  Automated Discovery of Process Models from Event Logs: Review and Benchmark , 2017, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jan Mendling,et al.  Seven process modeling guidelines (7PMG) , 2010, Inf. Softw. Technol..

[27]  Artem Polyvyanyy,et al.  Structuring process models , 2012 .

[28]  Bart Baesens,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012, Inf. Syst..

[29]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Weiru Chen,et al.  Graph-Based Modelling of Concurrent Sequential Patterns , 2010, Int. J. Data Warehous. Min..

[31]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[32]  Weiru Chen,et al.  Sequential Patterns Postprocessing for Structural Relation Patterns Mining , 2010, Strategic Advancements in Utilizing Data Mining and Warehousing Technologies.

[33]  Marlon Dumas,et al.  Unraveling Unstructured Process Models , 2010, BPMN.

[34]  Marlon Dumas,et al.  Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[35]  Jan Mendling,et al.  Understanding Business Process Models: The Costs and Benefits of Structuredness , 2012, CAiSE.

[36]  Marlon Dumas,et al.  BPMN Miner: Automated discovery of BPMN process models with hierarchical structure , 2016, Inf. Syst..

[37]  Boudewijn F. van Dongen,et al.  Measuring precision of modeled behavior , 2015, Inf. Syst. E Bus. Manag..

[38]  Zhao Li,et al.  Mining Compressed Repetitive Gapped Sequential Patterns Efficiently , 2009, ADMA.

[39]  Jussi Vanhatalo,et al.  Simplified Computation and Generalization of the Refined Process Structure Tree , 2010, WS-FM.