Detecting Refactored Clones

The availability of automated refactoring tools in modern development environments allows programmers to refactor their code with ease. Such tools, however, enable developers to inadvertently create code clones that quickly diverge in form but not in meaning. Furthermore, in the hands of those looking to confuse plagiarism-detection tools, automated refactoring may be abused to avoid discovery of copied code. We present Cider, an algorithm that can detect code clones regardless of various refactorings that may have been applied to some of the copies but not to others. Most significant is the ability to discover interprocedural clones, where parts of one copy have been extracted to separate methods. We evaluated Cider on several open-source Java projects, attempting to detect interprocedural clones between successive versions of each project. Interprocedural clones were detected in all evaluated projects, demonstrating the pervasive nature of the problem. Compared to a manual assessment, Cider performed well in terms of both recall and precision.

[1]  Yishai A. Feldman,et al.  Automatic high-quality reengineering of database programs by abstraction, transformation and reimplementation , 2003, TSEM.

[2]  Charles Rich A Formal Representation For Plans In The Programmer's Apprentice , 1982, On Conceptual Modelling.

[3]  Mark Harman,et al.  KClone: A Proposed Approach to Fast Precise Code Clone Detection , 2009 .

[4]  Rohit Gheyi,et al.  Analyzing Refactorings on Software Repositories , 2011, 2011 25th Brazilian Symposium on Software Engineering.

[5]  Ralph E. Johnson,et al.  Automated Detection of Refactorings in Evolving Components , 2006, ECOOP.

[6]  Shinji Kusumoto,et al.  Code Clone Detection on Specialized PDGs with Heuristics , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[7]  Michael W. Godfrey,et al.  Using origin analysis to detect merging and splitting of source code entities , 2005, IEEE Transactions on Software Engineering.

[8]  Yishai A. Feldman,et al.  Fine Slicing - Theory and Applications for Computation Extraction , 2012, FASE.

[9]  Dave Thomas,et al.  ECOOP 2006 - Object-Oriented Programming , 2006 .

[10]  Yishai A. Feldman,et al.  Portability by automatic translation: a large-scale case study , 1999 .

[11]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[12]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[13]  Andreas Speck Software Engineering (10) , 2007 .

[14]  Richard C. Waters,et al.  The programmer's apprentice , 1990, ACM Press frontier series.

[15]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[16]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[17]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[18]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[19]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[20]  Tudor Gîrba,et al.  How Developers Copy , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[21]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[22]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[23]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[24]  William F. Opdyke,et al.  Refactoring object-oriented frameworks , 1992 .

[25]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[26]  David W. Binkley,et al.  Interprocedural slicing using dependence graphs , 1990, TOPL.