Incremental Clone Detection

Finding, understanding and managing software clones - passages of duplicated source code - is of large interest in research and practice. There is an abundance of techniques to detect clones. However, all these techniques are limited to a single revision of a program. When the code changes, the analysis must be run again from scratch even though only small parts may have changed. In this paper, we present an incremental clone detection algorithm, which detects clones based on the results of the previous revision's analysis. Moreover, it creates a mapping between clones of one revision to the next, supplying information about the addition and deletion of clones. Our empirical results demonstrate that the incremental technique requires considerably less time than a non-incremental approach if the changes do not exceed a certain fraction of the source code. An incremental analysis is useful for on-the-fly detection and evolutionary clone analysis. On-the-fly detection may be integrated in an IDE and allows to re-run clone detection immediately when a programmer saves his changes or even while he/she is typing. In evolutionary clone analysis, many revisions of a system need to be analyzed in order to understand how clones evolve.

[1]  Miryung Kim,et al.  Program element matching for multi-version program analyses , 2006, MSR '06.

[2]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[3]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[4]  I Levenshtein Vladimir BINARY CODES CAPABLE OF CORRECTING DELETIONS, INSERTIONS, AND REVERSALS , 1966 .

[5]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[6]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Tudor Gîrba,et al.  How Developers Copy , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[8]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[9]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[10]  Rainer Koschke,et al.  Empirische Grundlagen für das Klonmanagement , 2008, Workshop Software Reengineering.

[11]  Roberto Grossi,et al.  A Note on Updating Suffix Tree Labels , 1997, CIAC.

[12]  Raffaele Giancarlo,et al.  Sparse Dynamic Programming for Longest Common Subsequence from Fragments , 2002, J. Algorithms.

[13]  Radu Marinescu,et al.  Archeology of code duplication: recovering duplication chains from small duplication fragments , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).

[14]  Christopher W. Fraser,et al.  Clone Detection via Structural Abstraction , 2007, WCRE.

[15]  R. Koschke,et al.  Frontiers of software clone management , 2008, 2008 Frontiers of Software Maintenance.

[16]  Raffaele Giancarlo,et al.  Dynamic Dictionary Matching , 1994, J. Comput. Syst. Sci..

[17]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[18]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[19]  Andrew Walenstein Code Clones: Reconsidering Terminology , 2006, Duplication, Redundancy, and Similarity in Software.

[20]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[21]  Jürgen Wolff von Gudenberg,et al.  Clone detection in source code by frequent itemset techniques , 2004, Source Code Analysis and Manipulation, Fourth IEEE International Workshop on.

[22]  J. Howard Johnson,et al.  Substring matching for clone detection and change tracking , 1994, Proceedings 1994 International Conference on Software Maintenance.

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[25]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[26]  Dietmar Seipel,et al.  Clone detection in source code by frequent itemset techniques , 2004 .

[27]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[28]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[29]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[30]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[31]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[32]  Renato De Mori,et al.  Pattern matching for design concept localization , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[33]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[34]  Michael W. Godfrey,et al.  Subjectivity in Clone Judgment: Can We Ever Agree? , 2006, Duplication, Redundancy, and Similarity in Software.

[35]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[36]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[37]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[38]  Giuliano Antoniol,et al.  Analyzing cloning evolution in the Linux kernel , 2002, Inf. Softw. Technol..

[39]  Neil Davey,et al.  The development of a software clone detector , 1995 .

[40]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[41]  Gad M. Landau,et al.  An Efficient Algorithm for the All Pairs Suffix-Prefix Problem , 1992, Inf. Process. Lett..

[42]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[43]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[44]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[45]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[46]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[47]  KoschkeRainer,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008 .

[48]  Dean W. Gonzalez,et al.  “=” considered harmful , 1991, ALET.

[49]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[50]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[51]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[52]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[53]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[54]  Michel Dagenais,et al.  Extending software quality assessment techniques to Java systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[55]  B. Baker On Finding Duplication in Strings and Software , 1993 .

[56]  Miryung Kim,et al.  Using a clone genealogy extractor for understanding and supporting evolution of code clones , 2005, MSR.

[57]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[58]  Giuliano Antoniol,et al.  Modeling clones evolution through time series , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[59]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[60]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[61]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..