Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models

Process mining refers to the extraction of process models from event logs. Real-life processes tend to be less structured and more flexible. Traditional process mining algorithms have problems dealing with such unstructured processes and generate “spaghetti-like” process models that are hard to comprehend. An approach to overcome this is to cluster process instances such that each of the resulting clusters correspond to coherent sets of process instances that can each be adequately represented by a process model. In this paper, we present multiple feature sets based on conserved patterns and show that the proposed feature sets have a better performance than contemporary approaches. We evaluate the goodness of the formed clusters using established fitness and comprehensibility metrics defined in the context of process mining. The proposed approach is able to generate clusters such that the process models mined from the clustered traces show a high degree of fitness and comprehensibility. Further, the proposed feature sets can be easily discovered in linear time making it amenable to real-time analysis of large data sets.

[1]  Wil M. P. van der Aalst,et al.  Rediscovering workflow models from event-based data using little thumb , 2003, Integr. Comput. Aided Eng..

[2]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[3]  Luigi Pontieri,et al.  Discovering expressive process models by clustering log traces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  van der Wmp Wil Aalst,et al.  Improving process mining with trace clustering [in Korean] , 2008 .

[5]  Hongjun Lu,et al.  Constructing suffix tree for gigabyte sequences with megabyte memory , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Boudewijn F. van Dongen,et al.  Process Mining Based on Clustering: A Quest for Precision , 2007, Business Process Management Workshops.

[7]  Mark Strembeck,et al.  Influence Factors of Understanding Business Process Models , 2008, BIS.

[8]  Wil M. P. van der Aalst,et al.  Trace Clustering in Process Mining , 2008, Business Process Management Workshops.

[9]  Boudewijn F. van Dongen,et al.  Business process mining: An industrial application , 2007, Inf. Syst..

[10]  Jan Mendling,et al.  Understanding the Occurrence of Errors in Process Models Based on Metrics , 2007, OTM Conferences.

[11]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, OTM Confederated International Conferences CoopIS, DOA, ODBASE, GADA, and IS 2007, Vilamoura, Portugal, November 25-30, 2007, Proceedings, Part II , 2007, OTM Conferences.

[13]  Cw Christian Günther,et al.  Improving Process Mining with Trace Clustering , 2008 .

[14]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[15]  Ke Wang,et al.  Proceedings of the SIAM International Conference on Data Mining, SDM 2009, April 30 - May 2, 2009, Sparks, Nevada, USA , 2009, SDM.

[16]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[17]  Gary Benson,et al.  Evaluating distance functions for clustering tandem repeats. , 2005, Genome informatics. International Conference on Genome Informatics.

[18]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[19]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[20]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[21]  Wil M. P. van der Aalst,et al.  Context Aware Trace Clustering: Towards Improving Process Mining Results , 2009, SDM.