Fundamental Approaches to Software Engineering

Process mining techniques have matured over the last decade and more and more organization started to use this new technology. The two most important types of process mining are process discovery (i.e., learning a process model from example behavior recorded in an event log) and conformance checking (i.e., comparing modeled behavior with observed behavior). Process mining is motivated by the availability of event data. However, as event logs become larger (say terabytes), performance becomes a concern. The only way to handle larger applications while ensuring acceptable response times, is to distribute analysis over a network of computers (e.g., multicore systems, grids, and clouds). This paper provides an overview of the different ways in which process mining problems can be distributed. We identify three types of distribution: replication, a horizontal partitioning of the event log, and a vertical partitioning of the event log. These types are discussed in the context of both procedural (e.g., Petri nets) and declarative process models. Most challenging is the horizontal partitioning of event logs in the context of procedural models. Therefore, a new approach to decompose Petri nets and associated event logs is presented. This approach illustrates that process mining problems can be distributed in various ways.

[1]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[2]  Wil M. P. van der Aalst,et al.  Rediscovering workflow models from event-based data using little thumb , 2003, Integr. Comput. Aided Eng..

[3]  Boudewijn F. van Dongen,et al.  Process mining: a two-step approach to balance between underfitting and overfitting , 2008, Software & Systems Modeling.

[4]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[5]  Wil M.P. van der Aalst,et al.  Declarative Specification and Verification of Service Choreographies , 2009 .

[6]  Bart Baesens,et al.  Robust Process Discovery with Artificial Negative Events , 2009, J. Mach. Learn. Res..

[7]  Mario Cannataro,et al.  Distributed data mining on grids: services, tools, and applications , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Wil M. P. van der Aalst,et al.  Distributed genetic process mining , 2010, IEEE Congress on Evolutionary Computation.

[9]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Josep Carmona,et al.  Process Mining from a Basis of State Regions , 2010, Petri Nets.

[11]  Boudewijn F. van Dongen,et al.  Conformance Checking Using Cost-Based Fitness Analysis , 2011, 2011 IEEE 15th International Enterprise Distributed Object Computing Conference.

[12]  Wil M. P. van der Aalst,et al.  Genetic process mining: an experimental evaluation , 2007, Data Mining and Knowledge Discovery.

[13]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[15]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[16]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..