Process diagnostics using trace alignment: Opportunities, issues, and challenges

Business processes leave trails in a variety of data sources (e.g., audit trails, databases, and transaction logs). Hence, every process instance can be described by a trace, i.e., a sequence of events. Process mining techniques are able to extract knowledge from such traces and provide a welcome extension to the repertoire of business process analysis techniques. Recently, process mining techniques have been adopted in various commercial BPM systems (e.g., BPM|one, Futura Reflect, ARIS PPM, Fujitsu Interstage, Businesscape, Iontas PDF, and QPR PA). Unfortunately, traditional process discovery algorithms have problems dealing with less structured processes. The resulting models are difficult to comprehend or even misleading. Therefore, we propose a new approach based on trace alignment. The goal is to align traces in such a way that event logs can be explored easily. Trace alignment can be used to explore the process in the early stages of analysis and to answer specific questions in later stages of analysis. Hence, it complements existing process mining techniques focusing on discovery and conformance checking. The proposed techniques have been implemented as plugins in the ProM framework. We report the results of trace alignment on one synthetic and two real-life event logs, and show that trace alignment has significant promise in process diagnostic efforts.

[1]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[2]  Wil M. P. van der Aalst,et al.  Semantic Process Mining Tools: Core Building Blocks , 2008, ECIS.

[3]  Cw Christian Günther Process mining in flexible environments , 2009 .

[4]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[5]  Song,et al.  Supporting proces mining by showing events at a glance , 2007 .

[6]  Christopher J. Lee,et al.  Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems , 2004, Bioinform..

[7]  Wil M. P. van der Aalst,et al.  Workflow Patterns , 2004, Distributed and Parallel Databases.

[8]  A. K. Wong,et al.  A survey of multiple sequence comparison methods. , 1992, Bulletin of mathematical biology.

[9]  Peter F. Stadler,et al.  New journal: Algorithms for Molecular Biology , 2006, Algorithms for Molecular Biology.

[10]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[11]  Wil M. P. van der Aalst,et al.  Trace Clustering Based on Conserved Patterns: Towards Achieving Better Process Models , 2009, Business Process Management Workshops.

[12]  Christopher J. Lee,et al.  Multiple sequence alignment using partial order graphs , 2002, Bioinform..

[13]  Wil M. P. van der Aalst,et al.  Mining Context-Dependent and Interactive Business Process Maps Using Execution Patterns , 2010, Business Process Management Workshops.

[14]  Wil M. P. van der Aalst,et al.  Process Mining towards Semantics , 2008, Advances in Web Semantics I.

[15]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[16]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[17]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[18]  Thomas Mailund,et al.  Rapid Neighbour-Joining , 2008, WABI.

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Joseph J. Gillespie,et al.  Structural and Evolutionary Considerations for Multiple Sequence Alignment of RNA , and the Challenges for Algorithms That Ignore Them , 2007 .

[21]  D. Lipman,et al.  The multiple sequence alignment problem in biology , 1988 .

[22]  Luigi Pontieri,et al.  Discovering expressive process models by clustering log traces , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[24]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[25]  Luigi Pontieri,et al.  Outlier Detection Techniques for Process Mining Applications , 2008, ISMIS.

[26]  J. Thompson,et al.  Issues in bioinformatics benchmarking: the case study of multiple sequence alignment , 2010, Nucleic acids research.

[27]  Francesco Folino,et al.  Mining usage scenarios in business processes: Outlier-aware discovery and run-time prediction , 2011, Data Knowl. Eng..

[28]  Wil M. P. van der Aalst,et al.  Trace Alignment in Process Mining: Opportunities for Process Diagnostics , 2010, BPM.

[29]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[30]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..

[31]  D. K. Y. Chiu,et al.  A survey of multiple sequence comparison methods , 1992 .

[32]  Wil M. P. van der Aalst,et al.  Abstractions in Process Mining: A Taxonomy of Patterns , 2009, BPM.

[33]  Osamu Gotoh,et al.  Multiple Sequence Alignment , 2005 .

[34]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[35]  P. T. G. Hornix Performance Analysis of Business Processes through Process Mining , 2007 .

[36]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[37]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[38]  R F Doolittle,et al.  Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. , 1996, Methods in enzymology.

[39]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[40]  Enoch S. Huang,et al.  PFAAT version 2.0: A tool for editing, annotating, and analyzing multiple sequence alignments , 2007, BMC Bioinformatics.

[41]  O. Gotoh,et al.  Multiple sequence alignment: algorithms and applications. , 1999, Advances in biophysics.

[42]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[43]  Sonja J. Prohaska,et al.  Multiple sequence alignment with user-defined anchor points , 2006, Algorithms for Molecular Biology.

[44]  Olivier Poch,et al.  New Challenges and Strategies for Multiple Sequence Alignment in the Proteomics Era , 2005 .

[45]  Cédric Notredame,et al.  Upcoming challenges for multiple sequence alignment methods in the high-throughput era , 2009, Bioinform..

[46]  Wil M. P. van der Aalst,et al.  Context Aware Trace Clustering: Towards Improving Process Mining Results , 2009, SDM.

[47]  Olivier Poch,et al.  A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives , 2011, PloS one.