Distributed genetic process mining

Process mining aims at discovering process models from data logs in order to offer insight into the real use of information systems. Most of the existing process mining algorithms fail to discover complex constructs or have problems dealing with noise and infrequent behavior. The genetic process mining algorithm overcomes these issues by using genetic operators to search for the fittest solution in the space of all possible process models. The main disadvantage of genetic process mining is the required computation time. In this paper we present a coarse-grained distributed variant of the genetic miner that reduces the computation time. The degree of the improvement obtained highly depends on the parameter values and event logs characteristics. We perform an empirical evaluation to determine guidelines for setting the parameters of the distributed algorithm.

[1]  Cw Christian Günther,et al.  Monitoring deployed application usage with process mining , 2008 .

[2]  Cw Christian Günther Process mining in flexible environments , 2009 .

[3]  Wil M. P. van der Aalst,et al.  Process Mining Applied to the Test Process of Wafer Scanners in ASML , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Erick Cantú-Paz,et al.  Parameter Setting in Parallel Genetic Algorithms , 2007, Parameter Setting in Evolutionary Algorithms.

[5]  Erick Cantú-Paz,et al.  A Survey of Parallel Genetic Algorithms , 2000 .

[6]  Wil M.P. van der Aalst,et al.  Genetic Process Mining , 2005, ICATPN.

[7]  Zbigniew Skolicki,et al.  The influence of migration sizes and intervals on island models , 2005, GECCO '05.

[8]  Wil M. P. van der Aalst,et al.  Genetic process mining: an experimental evaluation , 2007, Data Mining and Knowledge Discovery.

[9]  Wil M. P. van der Aalst,et al.  Application of Process Mining in Healthcare - A Case Study in a Dutch Hospital , 2008, BIOSTEC.

[10]  David E. Goldberg,et al.  Sizing Populations for Serial and Parallel Genetic Algorithms , 1989, ICGA.

[11]  M. Miki,et al.  Discussion on searching capability of distributed genetic algorithm on the grid , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[12]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Boudewijn F. van Dongen,et al.  Business process mining: An industrial application , 2007, Inf. Syst..

[15]  Erick Cantú-Paz,et al.  Migration Policies, Selection Pressure, and Parallel Evolutionary Algorithms , 2001, J. Heuristics.

[16]  Marco Tomassini,et al.  Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series) , 2005 .

[17]  Wil M. P. van der Aalst,et al.  Rediscovering workflow models from event-based data using little thumb , 2003, Integr. Comput. Aided Eng..

[18]  David E. Goldberg,et al.  On the Scalability of Parallel Genetic Algorithms , 1999, Evolutionary Computation.

[19]  Enrique Alba,et al.  Improving flexibility and efficiency by adding parallelism to genetic algorithms , 2002, Stat. Comput..

[20]  J.T. Alander,et al.  On optimal population size of genetic algorithms , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.