Studying the Impact of Clones on Software Defects

There are numerous studies that examine whether or not cloned code is harmful to software systems. Yet, few of them study which characteristics of cloned code in particular lead to software defects. In our work, we use survival analysis to understand the impact of clones on software defects and to determine the characteristics of cloned code that have the highest impact on software defects. Our survival models express the risk of defects in terms of basic predictors inherent to the code (e.g., LOC) and cloning predictors (e.g., number of clone siblings). We perform a case study using two clone detection tools on two large, long-lived systems using survival analysis. We determine that the defect-proneness of cloned methods is specific to the system under study and that more resources should be directed towards methods with a longer 'commit history'.

[1]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[2]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[3]  M. Cowles An R and S-PLUS Companion to Applied Regression , 2003 .

[4]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[5]  Elmar Jürgens,et al.  Static Bug Detection Through Analysis of Inconsistent Clones , 2008, Software Engineering.

[6]  Nils Göde,et al.  Modeling Clone Evolution , 2009 .

[7]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[8]  Hongfang Liu,et al.  Theory of relative defect proneness , 2008, Empirical Software Engineering.

[9]  Richard C. Holt,et al.  Studying the evolution of software systems using evolutionary code extractors , 2004 .

[10]  Audris Mockus,et al.  Predictors of customer perceived software quality , 2005, ICSE.

[11]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[12]  J. Singer,et al.  Applied Longitudinal Data Analysis , 2003 .

[13]  Audris Mockus,et al.  Software Dependencies, Work Dependencies, and Their Impact on Failures , 2009, IEEE Transactions on Software Engineering.

[14]  Miryung Kim,et al.  Using a clone genealogy extractor for understanding and supporting evolution of code clones , 2005, ACM SIGSOFT Softw. Eng. Notes.

[15]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[16]  Michele Lanza,et al.  BugCrawler: Visualizing Evolving Software Systems , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[17]  Premkumar T. Devanbu,et al.  Clones: What is that smell? , 2010, MSR.

[18]  Ying Zou,et al.  An Empirical Study on Inconsistent Changes to Code Clones at Release Level , 2009, 2009 16th Working Conference on Reverse Engineering.

[19]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[20]  Mei-Jie Zhang,et al.  Cox proportional hazards regression models for survival data in cancer research. , 2002, Cancer treatment and research.

[21]  Andrew Walenstein,et al.  06301 Summary -- Duplication, Redundancy, and Similarity in Software , 2006, Duplication, Redundancy, and Similarity in Software.