An empirical study of code clone genealogies

It has been broadly assumed that code clones are inherently bad and that eliminating clones by refactoring would solve the problems of code clones. To investigate the validity of this assumption, we developed a formal denition of clone evolution and built a clone genealogy tool that automatically extracts the history of code clones from a source code repository. Using our tool we extracted clone genealogy information for two Java open source projects and analyzed their evolution. Our study contradicts some conventional wisdom about clones. In particular, refactoring may not always improve software with respect to clones for two reasons. First, many code clones exist in the system for only a short time; extensive refactoring of such short-lived clones may not be worthwhile if they are likely diverge from one another very soon. Second, many clones, especially long-lived clones that have changed consistently with other elements in the same group, are not easily refactorable due to programming language limitations. These insights show that refactoring will not help in dealing with some types of clones and open up opportunities for complementary clone maintenance tools that target these other classes of clones.

[1]  Giuliano Antoniol,et al.  Analyzing cloning evolution in the Linux kernel , 2002, Inf. Softw. Technol..

[2]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[3]  Susan Horwitz,et al.  Effective, automatic procedure extraction , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[4]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[5]  Kent L. Beck,et al.  Extreme programming explained - embrace change , 1990 .

[6]  Shinji Kusumoto,et al.  Refactoring Support Based on Code Clone Analysis , 2004, PROFES.

[7]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[8]  Qiang Tu,et al.  Tracking structural evolution using origin analysis , 2002, IWPSE '02.

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  Rob Miller,et al.  Interactive Simultaneous Editing of Multiple Text Regions , 2001, USENIX ATC, General Track.

[11]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[12]  Daniel Jackson,et al.  Micromodels of software: lightweight modelling and analysis with Alloy , 2002 .

[13]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[14]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[15]  Michael W. Godfrey,et al.  Detecting merging and splitting using origin analysis , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[16]  Miryung Kim,et al.  Using a clone genealogy extractor for understanding and supporting evolution of code clones , 2005, MSR.

[17]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[18]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[19]  Giuliano Antoniol,et al.  An automatic approach to identify class evolution discontinuities , 2004, Proceedings. 7th International Workshop on Principles of Software Evolution, 2004..

[20]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[21]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[22]  Andrew Begel,et al.  Managing Duplicated Code with Linked Editing , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[23]  Damith C. Rajapakse,et al.  Beyond templates: a study of clones in the STL and some general implications , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[24]  Somesh Jha,et al.  Software Design as an Investment Activity: A Real Options Perspective , 1998 .

[25]  Magdalena Balazinska,et al.  Partial redesign of Java software systems based on clone analysis , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[26]  Shinji Kusumoto,et al.  ARIES: refactoring support tool for code clone , 2005, ACM SIGSOFT Softw. Eng. Notes.

[27]  Shinji Kusumoto,et al.  On detection of gapped code clones using gap locations , 2002, Ninth Asia-Pacific Software Engineering Conference, 2002..

[28]  Giuliano Antoniol,et al.  Linear complexity object-oriented similarity for clone detection and software evolution analyses , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[29]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[30]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.