Approaching Process Mining with Sequence Clustering: Experiments and Findings

Sequence clustering is a technique of bioinformatics that is used todiscover the properties of sequences by grouping them into clusters and assigningeach sequence to one of those clusters. In business process mining, the goal is alsoto extract sequence behaviour from an event log but the problem is oftensimplified by assuming that each event is already known to belong to a givenprocess and process instance. In this paper, we describe two experiments wherethis information is not available. One is based on a real-world case study ofobserving a software development team for three weeks. The other is based onsimulation and shows that it is possible to recover the original behaviour in a fullyautomated way. In both experiments, sequence clustering plays a central role.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Wil M. P. van der Aalst,et al.  Applications and Theory of Petri Nets , 1983, Informatik-Fachberichte.

[3]  Proceedings of the 27th international conference on Software engineering , 1995, ICSE 1995.

[4]  Alexander L. Wolf,et al.  Automating Process Discovery through Event-Data Analysis , 1995, 1995 17th International Conference on Software Engineering.

[5]  Isidro Ramos,et al.  Advances in Database Technology — EDBT'98 , 1998, Lecture Notes in Computer Science.

[6]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[7]  Dimitris Karagiannis,et al.  Integrating machine learning and workflow management to support acquisition and adaptation of workflow models , 2000 .

[8]  Anton J. Enright,et al.  GeneRAGE: a robust algorithm for sequence clustering and domain detection , 2000, Bioinform..

[9]  A. Godzik,et al.  Sequence clustering strategies improve remote homology recognitions while reducing search times. , 2002, Protein engineering.

[10]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[11]  Grigori Melnik,et al.  Knowledge management for distributed agile processes: models, techniques, and infrastructure , 2003, WET ICE 2003. Proceedings. Twelfth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 2003..

[12]  Boudewijn F. van Dongen,et al.  Workflow mining: A survey of issues and approaches , 2003, Data Knowl. Eng..

[13]  Hongjun Lu,et al.  Conceptual Modeling – ER 2004 , 2004, Lecture Notes in Computer Science.

[14]  Wil M. P. van der Aalst,et al.  Workflow mining: discovering process models from event logs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[16]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[17]  Boudewijn F. van Dongen,et al.  Multi-phase Process Mining: Building Instance Graphs , 2004, ER.

[18]  Paulo Cortez,et al.  Data Mining with , 2005 .

[19]  Gianfranco Ciardo,et al.  Applications and Theory of Petri Nets 2005, 26th International Conference, ICATPN 2005, Miami, USA, June 20-25, 2005, Proceedings , 2005, ICATPN.

[20]  Zhaohui Tang,et al.  Data Mining with SQL Server 2005 , 2005 .

[21]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[22]  James A. Casbon,et al.  A high level interface to SCOP and ASTRAL implemented in Python , 2006, BMC Bioinformatics.

[23]  José M. Tribolet,et al.  Modeling Contexts for Business Process Oriented Knowledge Support , 2005, Wissensmanagement.

[24]  H. Sofia Pinto,et al.  Enhancing Collaboration Services with Business Context Models , 2005 .

[25]  Wil M.P. van der Aalst,et al.  Genetic Process Mining , 2005, ICATPN.

[26]  Wil M.P. van der Aalst,et al.  Multi-Phase Process Mining : Aggregating Instance Graphs into EPCs and Petri Nets , 2005 .

[27]  Luigi Pontieri,et al.  Mining Hierarchies of Models: From Abstract Views to Concrete Specifications , 2005, Business Process Management.

[28]  Kevin D. Reilly,et al.  SEQOPTICS: a protein sequence clustering system , 2006, BMC Bioinformatics.

[29]  H. Sofia Pinto,et al.  A Context-based Approach to Discover Multitasking Behavior at Work , 2006 .

[30]  José M. Tribolet,et al.  Discovering Multitasking Behavior at Work: A Context-Based Ontology , 2006, TAMODIA.

[31]  H. Sofia Pinto,et al.  Discovering Personal Action Contexts with SQL Server Integration and Analysis Services , 2006 .

[32]  Kevin D. Reilly,et al.  SEQOPTICS: A Protein Sequence Clustering Method , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[33]  Cw Christian Günther,et al.  Mining activity clusters from low-level event logs , 2006 .

[34]  H. Sofia Pinto,et al.  Reverse-engineering of Individual and Inter-Personal Work Practices : A Context-based Approach , 2007 .

[35]  K. MedeirosA.,et al.  Genetic process mining , 2007 .