Fast detection of exact clones in business process model repositories

As organizations reach higher levels of business process management maturity, they often find themselves maintaining very large process model repositories, representing valuable knowledge about their operations. A common practice within these repositories is to create new process models, or extend existing ones, by copying and merging fragments from other models. We contend that if these duplicate fragments, a.k.a. exact clones, can be identified and factored out as shared subprocesses, the repository's maintainability can be greatly improved. With this purpose in mind, we propose an indexing structure to support fast detection of clones in process model repositories. Moreover, we show how this index can be used to efficiently query a process model repository for fragments. This index, called RPSDAG, is based on a novel combination of a method for process model decomposition (namely the Refined Process Structure Tree), with established graph canonization and string matching techniques. We evaluated the RPSDAG with large process model repositories from industrial practice. The experiments show that a significant number of non-trivial clones can be efficiently found in such repositories, and that fragment queries can be handled efficiently.

[1]  Remco M. Dijkman,et al.  APROMORE: An advanced process model repository , 2011, Expert Syst. Appl..

[2]  Hoan Anh Nguyen,et al.  Complete and accurate clone detection in graph-based models , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[3]  Ambuj K. Singh,et al.  Closure-Tree: An Index Structure for Graph Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Marie-Christine Fauvet,et al.  Fragment-Based Version Management for Repositories of Business Process Models , 2011, OTM Conferences.

[5]  Manfred Reichert,et al.  Enterprise Modelling and Information Systems Architectures - Concepts and Applications , Proceedings of the 2nd International Workshop on Enterprise Modelling and Information Systems Architectures (EMISA'07), St. Goar, Germany, October 8-9, 2007 , 2007, EMISA.

[6]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[7]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[8]  Dirk Fahland,et al.  Instantaneous Soundness Checking of Industrial Business Process Models , 2009, BPM.

[9]  Mathias Weske,et al.  Semantic Querying of Business Process Models , 2008, 2008 12th International IEEE Enterprise Distributed Object Computing Conference.

[10]  Remco M. Dijkman,et al.  Business Process Model Merging: An Approach to Business Process Consolidation , 2013, TSEM.

[11]  Ahmed Awad,et al.  BPMN-Q: A Language to Query Business Processes , 2007, EMISA.

[12]  Remco M. Dijkman,et al.  Managing large collections of business process models - Current techniques and challenges , 2012, Comput. Ind..

[13]  Manfred Reichert,et al.  Refactoring large process model repositories , 2011, Comput. Ind..

[14]  Hajo A. Reijers,et al.  Improved model management with aggregated business process models , 2009, Data Knowl. Eng..

[15]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[16]  Remco M. Dijkman,et al.  Identifying refactoring opportunities in process model repositories , 2011, Inf. Softw. Technol..

[17]  Michael Rosemann,et al.  Potential pitfalls of process modeling: part A , 2006, Bus. Process. Manag. J..

[18]  Catriel Beeri,et al.  Querying Business Processes with BP-QL , 2005, VLDB.

[19]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[20]  Remco M. Dijkman,et al.  Meronymy-Based Aggregation of Activities in Business Process Models , 2010, ER.

[21]  Thomas Teufel,et al.  SAP R/3 Process Oriented Implementation: Iterative Process Prototyping , 1998 .

[22]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[23]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[24]  Jussi Vanhatalo,et al.  Simplified Computation and Generalization of the Refined Process Structure Tree , 2010, WS-FM.

[25]  L. Babai Monte-Carlo algorithms in graph isomorphism testing , 2006 .

[26]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[27]  Mathias Weske,et al.  Efficient Compliance Checking Using BPMN-Q and Temporal Logic , 2008, BPM.

[28]  Jana Koehler,et al.  The refined process structure tree , 2008, Data Knowl. Eng..

[29]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[30]  Marlon Dumas,et al.  Clone Detection in Repositories of Business Process Models , 2011, BPM.

[31]  Wei Wang,et al.  Graph Database Indexing Using Structured Graph Decomposition , 2007, 2007 IEEE 23rd International Conference on Data Engineering.