An empirical study on the fault-proneness of clone migration in clone genealogies

Copy and paste activities create clone groups in software systems. The evolution of a clone group across the history of a software system is termed as clone genealogy. During the evolution of a clone group, developers may change the location of the code fragments in the clone group. The type of the clone group may also change (e.g., from Type-1 to Type-2). These two phenomena have been referred to as clone migration and clone mutation respectively. Previous studies have found that clone migration occur frequently in software systems, and suggested that clone migration can induce faults in a software system. In this paper, we examine how clone migration phenomena affect the risk for faults in clone segments, clone groups, and clone genealogies from three long-lived software systems JBoss, APACHE-ANT, and ARGOUML. Results show that: (1) migrated clone segments, clone groups, and clone genealogies are not equally fault-prone; (2) when a clone mutation occurs during a clone migration, the risk for faults in the migrated clone is increased; (3) migrating a clone that was not changed for a longer period of time is risky.

[1]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[2]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[3]  Ahmed E. Hassan,et al.  MapReduce as a general framework to support research in Mining Software Repositories (MSR) , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[4]  Liliane Jeanne Barbour EMPIRICAL STUDIES OF CODE CLONE GENEALOGIES , 2012 .

[5]  Martin P. Robillard,et al.  Tracking Code Clones in Evolving Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[6]  Chanchal K. Roy,et al.  Analyzing and Forecasting Near-Miss Clones in Evolving Software: An Empirical Study , 2011, 2011 16th IEEE International Conference on Engineering of Complex Computer Systems.

[7]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[8]  D. Sheskin Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition , 2000 .

[9]  Ying Zou,et al.  Studying the Impact of Clones on Software Defects , 2010, 2010 17th Working Conference on Reverse Engineering.

[10]  Richard C. Holt,et al.  Studying the evolution of software systems using evolutionary code extractors , 2004 .

[11]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[12]  Nils Göde,et al.  Evolution of Type-1 Clones , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[13]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[14]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[15]  Tibor Gyimóthy,et al.  Clone Smells in Software Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[16]  Foutse Khomh,et al.  An empirical study of the fault-proneness of clone mutation and clone migration , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[17]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[18]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[19]  Foutse Khomh,et al.  Late propagation in software clones , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).