An extended assessment of type-3 clones as detected by state-of-the-art tools

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools.

[1]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[2]  Mohammad El-Ramly,et al.  Similarity in Programs , 2006, Duplication, Redundancy, and Similarity in Software.

[3]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[4]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[5]  Rainer Koschke,et al.  An Assessment of Type-3 Clones as Detected by State-of-the-Art Tools , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[6]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[7]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[8]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[9]  Susan Horwitz,et al.  Detecting and Measuring Similarity in Code Clones , 2009 .

[10]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[11]  Michael W. Godfrey,et al.  A Taxonomy of Clones in Source Code: The Re–Engineers Most Wanted List , 2003 .

[12]  Michael W. Godfrey,et al.  Subjectivity in Clone Judgment: Can We Ever Agree? , 2006, Duplication, Redundancy, and Similarity in Software.

[13]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[14]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[15]  Kaizhong Zhang,et al.  Fast parallel algorithms for the unit cost editing distance between trees , 1989, SPAA '89.

[16]  Rainer Koschke,et al.  Extending the reflexion method for consolidating software variants into product lines , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[17]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[18]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[19]  Andrew Walenstein Code Clones: Reconsidering Terminology , 2006, Duplication, Redundancy, and Similarity in Software.

[20]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[21]  Shinji Kusumoto,et al.  ARIES: Refactoring support environment based on code clone analysis , 2004, IASTED Conf. on Software Engineering and Applications.

[22]  R. Koschke,et al.  Frontiers of software clone management , 2008, 2008 Frontiers of Software Maintenance.

[23]  Mark Harman,et al.  KClone: A Proposed Approach to Fast Precise Code Clone Detection , 2009 .

[24]  Christopher W. Fraser,et al.  Clone detection via structural abstraction , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[25]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[26]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems , 2006, J. Softw. Maintenance Res. Pract..

[27]  Rainer Koschke,et al.  An Intermediate Representation for Reverse Engineering Analyses , 1998 .

[28]  Rainer Koschke,et al.  Supporting the Grow-and-Prune Model in Software Product Lines Evolution Using Clone Detection , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[29]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[30]  Rainer Koschke,et al.  Extending the Reflexion Method for Consolidating Software Variants into Product Lines , 2007, WCRE.

[31]  Shinji Kusumoto,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[32]  Yun Yang,et al.  Problems creating task-relevant clone detection reference data , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[33]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[34]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[35]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[36]  Kaizhong Zhang,et al.  Algorithms for the constrained editing distance between ordered labeled trees and related problems , 1995, Pattern Recognit..

[37]  Michael W. Godfrey,et al.  Supporting the analysis of clones in software systems: Research Articles , 2006 .

[38]  Michael W. Godfrey,et al.  Toward a Taxonomy of Clones in Source Code: A Case Study , 2003 .

[39]  Andrew Walenstein,et al.  06301 Summary -- Duplication, Redundancy, and Similarity in Software , 2006, Duplication, Redundancy, and Similarity in Software.

[40]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[41]  Magdalena Balazinska,et al.  Advanced clone-analysis to support object-oriented system refactoring , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[42]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[43]  Ian H. Witten,et al.  Linear-time, incremental hierarchy inference for compression , 1997, Proceedings DCC '97. Data Compression Conference.

[44]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[45]  Xin Chen,et al.  Shared information and program plagiarism detection , 2004, IEEE Transactions on Information Theory.

[46]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[47]  Rainer Koschke,et al.  An evaluation of code similarity identification for the grow-and-prune model , 2009, CSMR 2009.