An Empirical Study of Clone Disappearances on Open Source Software Projects

Code clones have been well studied because the presence of clones is regarded as a bad smell for software maintenance. On the other hand, creating code clones has a positive aspect that reusing existing code by copy-and-paste operations can realize rapid development of software systems. Thus, it is not realistic to rid software systems of clones. That is to say, an efficient clone management is required. Based on this background, many researchers have studied on clone appearances or clone evolution. However, there is no research that reveals how clones were gone. To reveal why clones were gone or what characteristics disappeared clones have could promote the efficient clone management. This paper proposes an investigation method for clone disappearances, and conducted an empirical study on open source software systems. Our experimental results showed that clone disappearances occur many times. Moreover, it was found that disappeared clones tend to consist of less code fragments and be less complex than non-disappeared ones.

[1]  Hoan Anh Nguyen,et al.  Clone-Aware Configuration Management , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[2]  Martin P. Robillard,et al.  Clone region descriptors: Representing and tracking duplication in source code , 2010, TSEM.

[3]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[4]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[5]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.