An insight into the dispersion of changes in cloned and non-cloned code: A genealogy based empirical study

In this paper, we present an in-depth empirical study of a new metric, change dispersion, that measures the extent changes are scattered throughout the code of a software system. Intuitively, highly dispersed changes, the changes that are scattered throughout many software entities (such as files, classes, methods, and variables), should require more maintenance effort than the changes that only affect a few entities. In our research we investigate change dispersion on the code-base of a number of subject systems as a whole, and separately on each system's cloned and non-cloned code. Our central objective is to determine whether cloned code negatively affects software evolution and maintenance. The granularity of our focus is at the method level.Our experimental results on 16 open source subject systems written in four different programming languages (Java, C, C#, and Python) involving two clone detection tools (CCFinderX and NiCad) and considering three major types of clones (Type 1: exact, Type 2: dissimilar naming, and Type 3: some dissimilar code) suggests that change dispersion has a positive and statistically significant correlation with the change-proneness (or instability) of source code. Cloned code, especially in Java and C systems, often exhibits a higher change dispersion than non-cloned code. Also, changes to Type 3 clones are more dispersed compared to changes to Type 1 and Type 2 clones. According to our analysis, a primary cause of high change dispersion in cloned code is that clones from the same clone class often require corresponding changes to ensure they remain consistent.

[1]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[2]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[3]  Harald C. Gall,et al.  4th International Workshop on Mining Software Repositories (MSR 2007) , 2007, 29th International Conference on Software Engineering (ICSE'07 Companion).

[4]  Katsuro Inoue,et al.  Fifth international workshop on software clones: (IWSC 2011) , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[5]  Jens Krinke,et al.  Is cloned code older than non-cloned code? , 2011, IWSC '11.

[6]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[7]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[8]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[9]  Chanchal Kumar Roy,et al.  An automatic framework for extracting and classifying near-miss clone genealogies , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[10]  Michel Wermelinger,et al.  Tracking clones' imprint , 2010, IWSC '10.

[11]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[12]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[13]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[14]  Shinji Kusumoto,et al.  Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software , 2010, IWPSE-EVOL '10.

[15]  Chanchal Kumar Roy,et al.  An Empirical Study of Function Clones in Open Source Software , 2008, 2008 15th Working Conference on Reverse Engineering.

[16]  Manishankar Mondal,et al.  Dispersion of changes in cloned and non-cloned code , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[17]  Michel Wermelinger,et al.  Assessing the effect of clones on changeability , 2008, 2008 IEEE International Conference on Software Maintenance.

[18]  Nils Göde,et al.  Clone Stability , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[19]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[20]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[22]  Manishankar Mondal,et al.  An Empirical Study of the Impacts of Clones in Software Maintenance , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[23]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[24]  Chanchal Kumar Roy,et al.  Scenario-Based Comparison of Clone Detection Techniques , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[25]  Chanchal Kumar Roy,et al.  The NiCad Clone Detector , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[26]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Nils Göde,et al.  Cloned code: stable code , 2013, J. Softw. Evol. Process..

[28]  Chanchal Kumar Roy,et al.  Evaluating Code Clone Genealogies at Release Level: An Empirical Study , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[29]  Manishankar Mondal,et al.  Comparative stability of cloned and non-cloned code: an empirical study , 2012, SAC '12.

[30]  Manishankar Mondal,et al.  An empirical study on clone stability , 2012, SIAP.

[31]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.