Comparison and Evaluation of Clone Detection Tools

Many techniques for detecting duplicated source code (software clones) have been proposed in the past. However, it is not yet clear how these techniques compare in terms of recall and precision as well as space and time requirements. This paper presents an experiment that evaluates six clone detectors based on eight large C and Java programs (altogether almost 850 KLOC). Their clone candidates were evaluated by one of the authors as an independent third party. The selected techniques cover the whole spectrum of the state-of-the-art in clone detection. The techniques work on text, lexical and syntactic information, software metrics, and program dependency graphs.

[1]  Lutz Prechelt,et al.  JPlag: Finding plagiarisms among a set of programs , 2000 .

[2]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[3]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[4]  António Menezes Leitão Detection of Redundant Code Using R2D2 , 2004, Software Quality Journal.

[5]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[6]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[7]  Oscar Nierstrasz,et al.  On the effectiveness of clone detection by string matching , 2006, J. Softw. Maintenance Res. Pract..

[8]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[9]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[10]  James R. Cordy,et al.  Practical language-independent detection of near-miss clones , 2004, CASCON.

[11]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[12]  Shinji Kusumoto,et al.  On Software Maintenance Process Improvement Based on Code Clone Analysis , 2002, PROFES.

[13]  Rainer Koschke,et al.  Vergleich von Techniken zur Erkennung duplizierten Quellcodes , 2002 .

[14]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[15]  António Menezes Leitão,et al.  Detection of Redundant Code Using R2D2 , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[16]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004 .

[17]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[18]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[19]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[20]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[21]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[22]  Michael W. Godfrey,et al.  Improved tool support for the investigation of duplication in software , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[23]  Renato De Mori,et al.  Pattern matching for design concept localization , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[24]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[25]  Rachel Harrison,et al.  Evolution in software systems: foundations of the SPE classification scheme: Research Articles , 2006 .

[26]  Wuu Yang,et al.  Identifying syntactic differences between two programs , 1991, Softw. Pract. Exp..

[27]  Jürgen Wolff von Gudenberg,et al.  Clone detection in source code by frequent itemset techniques , 2004, Source Code Analysis and Manipulation, Fourth IEEE International Workshop on.

[28]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004, Proceedings. 19th International Conference on Automated Software Engineering, 2004..

[29]  Yun Yang,et al.  Problems creating task-relevant clone detection reference data , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[30]  K. Barraclough Eclipse , 2006, BMJ : British Medical Journal.

[31]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[32]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[33]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[34]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[35]  J. Howard Johnson,et al.  Visualizing textual redundancy in legacy source , 1994, CASCON.

[36]  Filippo Lanubile,et al.  Finding function clones in Web applications , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[37]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[38]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[39]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.