Process Discovery Algorithms Using Numerical Abstract Domains

The discovery of process models from event logs has emerged as one of the crucial problems for enabling the continuous support in the life-cycle of an information system. However, in a decade of process discovery research, the algorithms and tools that have appeared are known to have strong limitations in several dimensions. The size of the logs and the formal properties of the model discovered are the two main challenges nowadays. In this paper we propose the use of numerical abstract domains for tackling these two problems, for the particular case of the discovery of Petri nets. First, numerical abstract domains enable the discovery of general process models, requiring no knowledge (e.g., the bound of the Petri net to derive) for the discovery algorithm. Second, by using divide and conquer techniques we are able to control the size of the process discovery problems. The methods proposed in this paper have been implemented in a prototype tool and experiments are reported illustrating the significance of this fresh view of the process discovery problem.

[1]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[2]  Josep Carmona,et al.  Process Mining Meets Abstract Interpretation , 2010, ECML/PKDD.

[3]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[4]  Manuel Silva Suárez,et al.  Petri Nets and Manufacturing Systems: An Examples-Driven Tour , 2003, Lectures on Concurrency and Petri Nets.

[5]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[6]  Jianmin Wang,et al.  Mining process models with non-free-choice constructs , 2007, Data Mining and Knowledge Discovery.

[7]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..

[8]  Jianmin Wang,et al.  A novel approach for process mining based on event types , 2007, IEEE International Conference on Services Computing (SCC 2007).

[9]  Haizhou Wang,et al.  Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming , 2011, R J..

[10]  Josep Carmona,et al.  New Region-Based Algorithms for Deriving Bounded Petri Nets , 2010, IEEE Transactions on Computers.

[11]  Roberto Bagnara,et al.  Precise widening operators for convex polyhedra , 2003, Sci. Comput. Program..

[12]  Robin Bergenthum,et al.  Process Mining Based on Regions of Languages , 2007, BPM.

[13]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[14]  Goran Frehse,et al.  PHAVer: algorithmic verification of hybrid systems past HyTech , 2005, International Journal on Software Tools for Technology Transfer.

[15]  Manuel Silva Suárez,et al.  Linear Algebraic and Linear Programming Techniques for the Analysis of Place or Transition Net Systems , 1996, Petri Nets.

[16]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Josep Carmona,et al.  Light Region-based Techniques for Process Discovery , 2011, Fundam. Informaticae.

[18]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[19]  Patrick Cousot,et al.  Static determination of dynamic properties of programs , 1976 .

[20]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[21]  Boudewijn F. van Dongen,et al.  Process mining: a two-step approach to balance between underfitting and overfitting , 2008, Software & Systems Modeling.

[22]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[23]  Emil Popescu,et al.  On Galois Connexions , 1994 .

[24]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.

[25]  R. P. Jagadeesh Chandra Bose,et al.  Process mining in the large : preprocessing, discovery, and diagnostics , 2012 .

[26]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[27]  David Avis,et al.  How good are convex hull algorithms? , 1995, SCG '95.

[28]  Wil M. P. van der Aalst Process mining , 2012, CACM.

[29]  Andrzej Ehrenfeucht,et al.  Partial (set) 2-structures , 1990, Acta Informatica.

[30]  David Avis,et al.  On canonical representations of convex polyhedra , 2002 .

[31]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[32]  Wil M. P. van der Aalst,et al.  Process Mining - Discovery, Conformance and Enhancement of Business Processes , 2011 .

[33]  Bertrand Jeannet,et al.  Apron: A Library of Numerical Abstract Domains for Static Analysis , 2009, CAV.

[34]  Antoine Miné,et al.  The octagon abstract domain , 2001, Proceedings Eighth Working Conference on Reverse Engineering.