Clone evolution: a systematic review

Detection of code clones — similar or identical source code fragments — is of concern both to researchers and to practitioners. An analysis of the clone detection results for a single source code version provides a developer with information about a discrete state in the evolution of the software system. However, tracing clones across multiple source code versions permits a clone analysis to consider a temporal dimension. Such an analysis of clone evolution can be used to uncover the patterns and characteristics exhibited by clones as they evolve within a system. Developers can use the results of this analysis to understand the clones more completely, which may help them to manage the clones more effectively. Thus, studies of clone evolution serve a key role in understanding and addressing issues of cloning in software. In this paper, we present a systematic review of the literature on clone evolution. In particular, we present a detailed analysis of 30 relevant papers that we identified in accordance with our review protocol. The review results were organized to address three research questions. Through our answers to these questions, we present the methods that researchers have used to study clone evolution, the patterns that researchers have found evolving clones to exhibit, and the evidence that researchers have established regarding the extent of inconsistent change undergone by clones during software evolution. Overall, the review results indicate that whereas researchers have conducted several empirical studies of clone evolution, there are contradictions among the reported findings, particularly regarding the lifetimes of clone lineages and the consistency with which clones are changed during software evolution. We identify human‐based empirical studies and classification of clone evolution patterns as two areas that are in particular need of further work. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Yue Jia,et al.  Distinguishing copies from originals in software clones , 2010, IWSC '10.

[2]  Giuliano Antoniol,et al.  Analyzing cloning evolution in the Linux kernel , 2002, Inf. Softw. Technol..

[3]  Michael W. Godfrey,et al.  Subjectivity in Clone Judgment: Can We Ever Agree? , 2006, Duplication, Redundancy, and Similarity in Software.

[4]  James R. Cordy,et al.  Comprehending reality - practical barriers to industrial adoption of software maintenance automation , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[5]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[6]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[7]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[8]  Zoltán Ádám Mann,et al.  Three public enemies: cut, copy, and paste , 2006, Computer.

[9]  Nils Göde,et al.  Modeling Clone Evolution , 2009 .

[10]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[11]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[12]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[13]  Paula Gomes Mian,et al.  Systematic Review in Software Engineering , 2005 .

[14]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[15]  Shinji Kusumoto,et al.  On detection of gapped code clones using gap locations , 2002, Ninth Asia-Pacific Software Engineering Conference, 2002..

[16]  Katsuro Inoue,et al.  Similarity of software system and its measurement tool SMMT , 2007, Syst. Comput. Jpn..

[17]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[18]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[19]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[20]  Rainer Koschke,et al.  Studying clone evolution using incremental clone detection , 2013, J. Softw. Evol. Process..

[21]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[22]  Nils Göde,et al.  Evolution of Type-1 Clones , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[23]  Martin P. Robillard,et al.  Clone region descriptors: Representing and tracking duplication in source code , 2010, TSEM.

[24]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[25]  Shinji Kusumoto,et al.  Gemini: maintenance support environment based on code clone analysis , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[26]  Chanchal Kumar Roy,et al.  Evaluating Code Clone Genealogies at Release Level: An Empirical Study , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[27]  Jeffrey C. Carver,et al.  Measuring the Efficacy of Code Clone Information in a Bug Localization Task: An Empirical Study , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[28]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[29]  Michel Wermelinger,et al.  Assessing the effect of clones on changeability , 2008, 2008 IEEE International Conference on Software Maintenance.

[30]  Tudor Gîrba,et al.  How Developers Copy , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[31]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[32]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[33]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[34]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[35]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[36]  Michel Wermelinger,et al.  Tracking clones' imprint , 2010, IWSC '10.

[37]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[38]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[39]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[40]  Jeffrey C. Carver,et al.  A systematic literature review to identify and classify software requirement errors , 2009, Inf. Softw. Technol..

[41]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[42]  Ferosh Jacob,et al.  CnP: Towards an environment for the proactive management of copy-and-paste programming , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[43]  Arie van Deursen,et al.  Managing code clones using dynamic change tracking and resolution , 2009, 2009 IEEE International Conference on Software Maintenance.

[44]  Hoan Anh Nguyen,et al.  Clone-Aware Configuration Management , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[45]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[46]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[47]  Nils Göde,et al.  Efficiently handling clone data: RCF and cyclone , 2011, IWSC '11.

[48]  Jeffrey C. Carver,et al.  Characterizing software architecture changes: A systematic review , 2010, Inf. Softw. Technol..

[49]  Jeffrey C. Carver,et al.  The Impact of Educational Background on the Effectiveness of Requirements Inspections: An Empirical Study , 2008, IEEE Transactions on Software Engineering.

[50]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[51]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[52]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[53]  Tibor Gyimóthy,et al.  New Conceptual Coupling and Cohesion Metrics for Object-Oriented Systems , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[54]  Jeffrey G. Gray,et al.  An information retrieval process to aid in the analysis of code clones , 2009, Empirical Software Engineering.

[55]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[56]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[57]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[58]  Letha H. Etzkorn,et al.  Special issue on information retrieval for program comprehension , 2008, Empirical Software Engineering.

[59]  Jeffrey C. Carver,et al.  On the need for human-based empirical validation of techniques and tools for code clone analysis , 2011, IWSC '11.

[60]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[61]  David M. Blei,et al.  Hierarchical relational models for document networks , 2009, 0909.4331.

[62]  Tore Dybå,et al.  A Systematic Review of Theory Use in Software Engineering Experiments , 2007, IEEE Transactions on Software Engineering.

[63]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[64]  Harald C. Gall,et al.  Relation of Code Clones and Change Couplings , 2006, FASE.

[65]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[66]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[67]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[68]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[69]  Michael W. Godfrey,et al.  Facilitating software evolution research with kenyon , 2005, ESEC/FSE-13.

[70]  Premkumar T. Devanbu,et al.  Clones: what is that smell? , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).