Distinguishing copies from originals in software clones

Cloning is widespread in today's systems where automated assistance is required to locate cloned code. Although the evolution of clones has been studied for many years, no attempt has been made so far to automatically distinguish the original source code leading to cloned copies. This paper presents an approach to classify the clones of a clone pair based on the version information available in version control systems. This automatic classification attempts to distinguish the original from the copy. It allows for the fact that the clones may be modified and thus consist of lines coming from different versions. An evaluation, based on two case studies, shows that when comments are ignored and a small tolerance is accepted, for the majority of clone pairs the proposed approach can automatically distinguish between the original and the copy.

[1]  Stephan Diehl,et al.  Identifying Refactorings from Source-Code Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[2]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[3]  Jens Krinke,et al.  Is Cloned Code More Stable than Non-cloned Code? , 2008, 2008 Eighth IEEE International Working Conference on Source Code Analysis and Manipulation.

[4]  Daniel M. Germán,et al.  Code siblings: Technical and legal implications of copying code between applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[5]  Mark Harman,et al.  KClone: A Proposed Approach to Fast Precise Code Clone Detection , 2009 .

[6]  James R. Cordy,et al.  Comprehending reality - practical barriers to industrial adoption of software maintenance automation , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[7]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[8]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[9]  Dean W. Gonzalez,et al.  “=” considered harmful , 1991, ALET.

[10]  Jens Krinke,et al.  A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[11]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  Harald C. Gall,et al.  Relation of Code Clones and Change Couplings , 2006, FASE.

[13]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[14]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[15]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[16]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[17]  Lerina Aversano,et al.  How Clones are Maintained: An Empirical Study , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[18]  Yue Jia,et al.  Cloning and copying between GNOME projects , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[19]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[20]  Hagen Hagen Is Cloned Code more stable than Non-Cloned Code? , 2008 .

[21]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[22]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[23]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[24]  Nils Göde,et al.  Evolution of Type-1 Clones , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[25]  Christian Prause Maintaining Fine-Grained Code Metadata Regardless of Moving, Copying and Merging , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[26]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[27]  Gerardo Canfora,et al.  Tracking Your Changes: A Language-Independent Approach , 2009, IEEE Software.

[28]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[29]  Michael W. Godfrey,et al.  Using origin analysis to detect merging and splitting of source code entities , 2005, IEEE Transactions on Software Engineering.