Towards clone detection in UML domain models

Code clones (i.e., duplicate fragments of code) have been studied for long, and there is strong evidence that they are a major source of software faults. Anecdotal evidence suggests that this phenomenon occurs similarly in models, suggesting that model clones are as detrimental to model quality as they are to code quality. However, programming language code and visual models have significant differences that make it difficult to directly transfer notions and algorithms developed in the code clone arena to model clones. In this article, we develop and propose a definition of the notion of “model clone” based on the thorough analysis of practical scenarios. We propose a formal definition of model clones, specify a clone detection algorithm for UML domain models, and implement it prototypically. We investigate different similarity heuristics to be used in the algorithm, and report the performance of our approach. While we believe that our approach advances the state of the art significantly, it is restricted to UML models, its results leave room for improvements, and there is no validation by field studies.

[1]  Shengbing Ren,et al.  Refactoring the Scenario Specification: A Message Sequence Chart Approach , 2003, OOIS.

[2]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Dimitris Karagiannis,et al.  Ein Geschäftsprozessmanagement-Werkzeug der nächsten Generation — ADONIS: Konzeption und Anwendungen , 2000, Wirtschaftsinf..

[4]  Manfred Nagl,et al.  Graph-Grammars and Their Application to Computer Science , 1982, Lecture Notes in Computer Science.

[5]  Andrew Fish,et al.  Layout of (Software) Engineering Diagrams , 2007 .

[6]  Andrew Fish,et al.  Visual qualities of the Unified Modeling Language:Deficiencies and Improvements , 2007, IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007).

[7]  Joaquin Miller,et al.  MDA Guide Version 1.0.1 , 2003 .

[8]  Harald Störrle Expressing model constraints visually with VMQL , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[9]  Jan Mendling,et al.  The Impact of Secondary Notation on Process Model Understanding , 2009, PoEM.

[10]  Rainer Koschke,et al.  An Assessment of Type-3 Clones as Detected by State-of-the-Art Tools , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[11]  Philip A. Bernstein,et al.  Adapting a generic match algorithm to align ontologies of human anatomy , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Zhiyi Ma,et al.  Detecting Duplications in Sequence Diagrams Based on Suffix Trees , 2006, 2006 13th Asia Pacific Software Engineering Conference (APSEC'06).

[13]  Harald Störrle Towards clone detection in UML domain models , 2010, ECSA '10.

[14]  Bernhard Schätz,et al.  Model clone detection in practice , 2010, IWSC '10.

[15]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Bran Selic,et al.  An MDA Manifesto , 2004 .

[17]  Manfred Nagl,et al.  A Specification Environment for Graph Grammars , 1990, Graph-Grammars and Their Application to Computer Science.

[18]  Bran Selic,et al.  The Pragmatics of Model-Driven Development , 2003, IEEE Softw..

[19]  Hoan Anh Nguyen,et al.  Complete and accurate clone detection in graph-based models , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[20]  Emmanuel Pietriga,et al.  Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC '10) , 2010 .

[21]  Andy Schürr,et al.  Introduction to PROGRESS, an Attribute Graph Grammar Based Specification Language , 1990, WG.

[22]  Harald Störrle On the impact of layout quality to understanding UML diagrams , 2011, 2011 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[23]  Mehrdad Sabetzadeh,et al.  Matching and Merging of Statecharts Specifications , 2007, 29th International Conference on Software Engineering (ICSE'07).

[24]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[25]  Richard F. Paige,et al.  Merging models with the epsilon merging language (EML) , 2006, MoDELS'06.

[26]  Harald Störrle A Logical Model Query Interface , 2009 .

[27]  Harald Störrle Structuring very large domain models: experiences from industrial MDSD projects , 2010, ECSA '10.

[28]  Udo Kelter,et al.  A Generic Difference Algorithm for UML Models , 2005, Software Engineering.

[29]  Ivan Porres,et al.  Difference and Union of Models , 2003, UML.

[30]  Harald Störrle Large scale modeling efforts: a survey on challenges and best practices , 2007 .

[31]  Harald Störrle VMQL: A generic visual model query language , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[32]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[33]  Michael W. Godfrey,et al.  Subjectivity in Clone Judgment: Can We Ever Agree? , 2006, Duplication, Redundancy, and Similarity in Software.

[34]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[35]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[36]  Harald Störrle,et al.  VMQL: A visual language for ad-hoc model querying , 2011, J. Vis. Lang. Comput..

[37]  Hoan Anh Nguyen,et al.  Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection , 2009, FASE.