Evaluating Modern Clone Detection Tools

Many clone detection tools and techniques have been introduced in the literature, and these tools have been used to manage clones and study their effects on software maintenance and evolution. However, the performance of these modern tools is not well known, especially recall. In this paper, we evaluate and compare the recall of eleven modern clone detection tools using four benchmark frameworks, including: (1) Bellon's Framework, (2) our modification to Bellon's Framework to improve the accuracy of its clone matching metrics, (3) Murakamki et al.'s extension of Bellon's Framework which adds type 3 gap awareness to the framework, and (4) our Mutation and Injection Framework. Bellon's Framework uses a curated corpus of manually validated clones detected by tools contemporary to 2002. In contrast, our Mutation and Injection Framework synthesizes a corpus of artificial clones using a cloning taxonomy produced in 2009. While still very popular in the clone community, there is some concern that Bellon's corpus may not be accurate for modern clone detection tools. We investigate the accuracy of the frameworks by (1) checking for anomalies in their results, (2) checking for agreement between the frameworks, and (3) checking for agreement with our expectations of these tools. Our expectations are researched and flexible. While expectations may contain inaccuracies, they are valuable for identifying possible inaccuracies in a benchmark. We find anomalies in the results of Bellon's Framework, and disagreement with both our expectations and the Mutation Framework. We conclude that Bellon's Framework may not be accurate for modern tools, and that an update of its corpus with clones detected by the modern tools is warranted. The results of the Mutation Framework agree with our expectations in most cases. We suggest that it is a good solution for evaluating modern tools.

[1]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[2]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[3]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[4]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[5]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[6]  Chanchal Kumar Roy,et al.  A mutation analysis based benchmarking framework for clone detectors , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[7]  Warren Toomey,et al.  Ctcompare: Code clone detection using hashed token sequences , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[8]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[9]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[10]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[11]  Brenda S. Baker,et al.  Finding Clones with Dup: Analysis of an Experiment , 2007, IEEE Transactions on Software Engineering.

[12]  Shinji Kusumoto,et al.  Enhancing Quality of Code Clone Detection with Program Dependency Graph , 2009, 2009 16th Working Conference on Reverse Engineering.

[13]  Shinji Kusumoto,et al.  A dataset of clone references with gaps , 2014, MSR 2014.

[14]  Yun Yang,et al.  Problems creating task-relevant clone detection reference data , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[15]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[16]  Chanchal Kumar Roy,et al.  SimCad: An extensible and faster clone detection tool for large scale software systems , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[17]  Rainer Koschke,et al.  Vergleich von Techniken zur Erkennung duplizierten Quellcodes , 2002 .

[18]  Chanchal Kumar Roy,et al.  The NiCad Clone Detector , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[19]  Elmar Jürgens,et al.  CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[20]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).