Studying clone evolution using incremental clone detection

Finding, understanding and managing software clones—passages of duplicated source code—is of large interest in research and practice. Analyzing the evolution of clones across multiple versions of a program adds value to both applications. Although there is an abundance of techniques to detect clones, current approaches are limited to a single version of a program. The current techniques to track clones utilize these single‐version approaches and map clones of consecutive versions retroactively. This causes an unnecessary overhead in runtime and may lead to an incorrect mapping due to ambiguity. In this paper, we present an incremental clone detection algorithm, which detects clones based on the results of the previous version's analysis. It creates a mapping between clones of consecutive versions along with the detection. We evaluated our incremental approach regarding its advantage in runtime as well as the usefulness of the mapping for studies on the clone evolution. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[2]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[3]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[4]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[5]  Renato De Mori,et al.  Pattern matching for design concept localization , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[6]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[7]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[8]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[9]  Katsuro Inoue,et al.  Analysis of the Linux Kernel Evolution Using Code Clone Coverage , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[10]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[11]  Giuliano Antoniol,et al.  Analyzing cloning evolution in the Linux kernel , 2002, Inf. Softw. Technol..

[12]  Roberto Grossi,et al.  A Note on Updating Suffix Tree Labels , 1997, CIAC.

[13]  Gad M. Landau,et al.  An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[14]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[15]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[16]  Akito Monden,et al.  Software quality analysis by code clones in industrial legacy software , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[17]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[18]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[19]  R. Koschke,et al.  Frontiers of software clone management , 2008, 2008 Frontiers of Software Maintenance.

[20]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[21]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[22]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[23]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[24]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[25]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[26]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[27]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[28]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[29]  Rainer Koschke,et al.  An Assessment of Type-3 Clones as Detected by State-of-the-Art Tools , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[30]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[31]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[32]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[33]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[34]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[35]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[36]  Giuliano Antoniol,et al.  Modeling clones evolution through time series , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[37]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[38]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[39]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[40]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[41]  Michel Wermelinger,et al.  Assessing the effect of clones on changeability , 2008, 2008 IEEE International Conference on Software Maintenance.

[42]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[43]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.