Do code clones matter?

Code cloning is not only assumed to inflate maintenance costs but also considered defect-prone as inconsistent changes to code duplicates can lead to unexpected behavior. Consequently, the identification of duplicated code, clone detection, has been a very active area of research in recent years. Up to now, however, no substantial investigation of the consequences of code cloning on program correctness has been carried out. To remedy this shortcoming, this paper presents the results of a large-scale case study that was undertaken to find out if inconsistent changes to cloned code can indicate faults. For the analyzed commercial and open source systems we not only found that inconsistent changes to clones are very frequent but also identified a significant number of faults induced by such changes. The clone detection tool used in the case study implements a novel algorithm for the detection of inconsistent clones. It is available as open source to enable other researchers to use it as basis for further investigations.

[1]  Hanjo Täubig,et al.  Fast Structure Searching for Computational Proteomics , 2007 .

[2]  Michel Wermelinger,et al.  Assessing the effect of clones on changeability , 2008, 2008 IEEE International Conference on Software Maintenance.

[3]  Esko Ukkonen,et al.  Approximate String-Matching over Suffix Trees , 1993, CPM.

[4]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[5]  Ralf Lämmel,et al.  Semi‐automatic grammar recovery , 2001, Softw. Pract. Exp..

[6]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[7]  Esko Ukkonen,et al.  Two Algorithms for Approximate String Matching in Static Texts , 1991, MFCS.

[8]  Akito Monden,et al.  Software quality analysis by code clones in industrial legacy software , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[9]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[10]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[11]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[12]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[13]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[14]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[15]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[16]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[17]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[18]  Daqing Hou,et al.  CReN: a tool for tracking copy-and-paste code clones and renaming identifiers consistently in the IDE , 2007, eclipse '07.

[19]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[20]  Elmar Jürgens,et al.  Static Bug Detection Through Analysis of Inconsistent Clones , 2008, Software Engineering.

[21]  Michael Eichberg,et al.  A Handbook of Software and Systems Engineering , 2009 .

[22]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[23]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[24]  Elmar Jürgens,et al.  CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[26]  Shinji Kusumoto,et al.  On detection of gapped code clones using gap locations , 2002, Ninth Asia-Pacific Software Engineering Conference, 2002..

[27]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[28]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[29]  Harald C. Gall,et al.  Relation of Code Clones and Change Couplings , 2006, FASE.

[30]  Christopher W. Fraser,et al.  Clone detection via structural abstraction , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[31]  Elmar Jürgens,et al.  Tool Support for Continuous Quality Control , 2008, IEEE Software.

[32]  Dean W. Gonzalez,et al.  “=” considered harmful , 1991, ALET.

[33]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[34]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.