Scalable process discovery and conformance checking

Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

[1]  James R. Larus,et al.  Mining specifications , 2002, POPL '02.

[2]  Robin Bergenthum,et al.  Process Mining Based on Regions of Languages , 2007, BPM.

[3]  Marlon Dumas,et al.  Log Delta Analysis: Interpretable Differencing of Business Process Event Logs , 2015, BPM.

[4]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[5]  Wil M. P. van der Aalst,et al.  Single-Entry Single-Exit decomposed conformance checking , 2014, Inf. Syst..

[6]  Gordon S. Blair,et al.  Scalable Dynamic Business Process Discovery with the Constructs Competition Miner , 2014, SIMPDA.

[7]  Wil M. P. van der Aalst,et al.  Modeling Business Processes - A Petri Net-Oriented Approach , 2011, Cooperative Information Systems series.

[8]  Rob J. van Glabbeek,et al.  Branching time and abstraction in bisimulation semantics , 1996, JACM.

[9]  Sira Yongchareon,et al.  Efficient Process Model Discovery Using Maximal Pattern Mining , 2015, BPM.

[10]  Jcam Joos Buijs,et al.  Flexible evolutionary algorithms for mining structured process models , 2014 .

[11]  Luciano Lavagno,et al.  Deriving Petri Nets for Finite Transition Systems , 1998, IEEE Trans. Computers.

[12]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[13]  Sander J. J. Leemans,et al.  Scalable Process Discovery with Guarantees , 2015, BMMDS/EMMSAD.

[14]  Ernesto López-Mellado,et al.  Petri net discovery of discrete event processes by computing t-invariants , 2014, Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA).

[15]  Sander J. J. Leemans,et al.  Formalising and analysing the control software of the Compact Muon Solenoid Experiment at the Large Hadron Collider , 2011, Sci. Comput. Program..

[16]  Dirk Fahland,et al.  Handling Duplicated Tasks in Process Discovery by Refining Event Labels , 2016, BPM.

[17]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[18]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[19]  Zhendong Su,et al.  Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[20]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..

[21]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[22]  Gordon S. Blair,et al.  Constructs Competition Miner: Process Control-Flow Discovery of BP-Domain Constructs , 2014, BPM.

[23]  Ran Wolff,et al.  Distributed Data Mining in Peer-to-Peer Networks , 2006, IEEE Internet Computing.

[24]  Peter Linz,et al.  An Introduction to Formal Languages and Automata , 1997 .

[25]  Oscar Pastor,et al.  Seminal Contributions to Information Systems Engineering: 25 Years of CAiSE , 2013 .

[26]  Boudewijn F. van Dongen,et al.  On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery , 2012, OTM Conferences.

[27]  Eric Badouel On the alpha-Reconstructibility of Workflow Nets , 2012 .

[28]  A Arya Adriansyah,et al.  Aligning observed and modeled behavior , 2014 .

[29]  Jianmin Wang,et al.  Mining Invisible Tasks from Event Logs , 2007, APWeb/WAIM.

[30]  Mathias Weske,et al.  Querying process models by behavior inclusion , 2015, Software & Systems Modeling.

[31]  Wil M. P. van der Aalst,et al.  Decomposing Petri nets for process mining: A generic approach , 2013, Distributed and Parallel Databases.

[32]  Andrea Burattin,et al.  PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings , 2015, ArXiv.

[33]  Mogens Nielsen,et al.  Decidability Issues for Petri Nets { a Survey 1 , 1994 .

[34]  Boudewijn F. van Dongen,et al.  Conformance Checking Using Cost-Based Fitness Analysis , 2011, 2011 IEEE 15th International Enterprise Distributed Object Computing Conference.

[35]  Matthias Weidlich,et al.  On Profiles and Footprints - Relational Semantics for Petri Nets , 2012, Petri Nets.

[36]  Boudewijn F. van Dongen,et al.  A genetic algorithm for discovering process trees , 2012, 2012 IEEE Congress on Evolutionary Computation.

[37]  Alessandro Sperduti,et al.  Heuristics Miners for Streaming Event Data , 2012, ArXiv.

[38]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[39]  Marlon Dumas,et al.  Behavioral Comparison of Process Models Based on Canonically Reduced Event Structures , 2014, BPM.

[40]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[41]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.

[42]  Mogens Nielsen,et al.  Decidability Issues for Petri Nets - a survey , 1994, Bull. EATCS.

[43]  Serge Haddad,et al.  Application and Theory of Petri Nets , 2012, Lecture Notes in Computer Science.

[44]  Boudewijn F. van Dongen,et al.  Alignment Based Precision Checking , 2012, Business Process Management Workshops.

[45]  Volker Gruhn,et al.  Process mining for knowledge-intensive business processes , 2015, I-KNOW.

[46]  Wil M. P. van der Aalst,et al.  Decomposing Process Mining Problems Using Passages , 2012, Petri Nets.

[47]  Ricardo Seguel,et al.  Process Mining Manifesto , 2011, Business Process Management Workshops.

[48]  Thomas R. Gross,et al.  Automatic Generation of Object Usage Specifications from Large Method Traces , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[49]  Mathias Weske,et al.  Causal Behavioural Profiles - Efficient Computation, Applications, and Evaluation , 2011, Fundam. Informaticae.

[50]  Remco M. Dijkman,et al.  Measuring Similarity between Business Process Models , 2008, CAiSE.

[51]  K. Vanhoof,et al.  Mining navigation patterns using a sequence alignment method , 2007, Knowledge and Information Systems.

[52]  Jianmin Wang,et al.  A workflow net similarity measure based on transition adjacency relations , 2010, Comput. Ind..

[53]  Jianmin Wang,et al.  Mining process models with non-free-choice constructs , 2007, Data Mining and Knowledge Discovery.

[54]  WenLijie,et al.  Mining process models with non-free-choice constructs , 2007 .

[55]  Robin Bergenthum,et al.  Synthesis of Petri Nets from Term Based Representations of Infinite Partial Languages , 2009, Fundam. Informaticae.

[56]  Sander J. J. Leemans,et al.  Exploring Processes and Deviations , 2014, Business Process Management Workshops.

[57]  Wil M. P. van der Aalst,et al.  Process Mining - Discovery, Conformance and Enhancement of Business Processes , 2011 .

[58]  Frank Leymann,et al.  Faster and More Focused Control-Flow Analysis for Business Process Models Through SESE Decomposition , 2007, ICSOC.

[59]  Ralf Laue,et al.  A comparative survey of business process similarity measures , 2012, Comput. Ind..

[60]  Wil M. P. van der Aalst,et al.  Process mining in software systems: Discovering real-life business transactions and process models from distributed systems , 2015, 2015 ACM/IEEE 18th International Conference on Model Driven Engineering Languages and Systems (MODELS).

[61]  Wil M. P. van der Aalst,et al.  Process Cubes: Slicing, Dicing, Rolling Up and Drilling Down Event Data for Process Mining , 2013, AP-BPM.

[62]  Josep Carmona,et al.  PMLAB: An Scripting Environment for Process Mining , 2014, BPM.

[63]  Joerg Evermann,et al.  Scalable Process Discovery Using Map-Reduce , 2016, IEEE Transactions on Services Computing.

[64]  Vojtech Huser,et al.  Process Mining: Discovery, Conformance and Enhancement of Business Processes , 2012, J. Biomed. Informatics.

[65]  Eric Badouel,et al.  On the α-Reconstructibility of Workflow Nets , 2012, Petri Nets.

[66]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Incomplete Event Logs , 2014, Petri Nets.

[67]  Remco M. Dijkman,et al.  A Short Survey on Process Model Similarity , 2013, Seminal Contributions to Information Systems Engineering.

[68]  Bart Baesens,et al.  A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs , 2012, Inf. Syst..

[69]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[70]  Christian W. Günther,et al.  Disco: Discover Your Processes , 2012, BPM.