How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors

Research on clone detection has been quite successful over the past two decades, which produced a number of state-of-the-art clone detectors. However, it has been still challenging to detect clones, even with such successful detectors, across multiple projects or on thousands of revisions of code in limited time. A simple and coarse-grained detector will be an alternative of detectors using fine- grained analysis. It will drastically reduce time required for detection although it may miss some of clones that fine-grained detectors can detect. Hence, it should be adequate for a tentative analysis of clones if it has an acceptable accuracy. However, it is not clear how accurate such a coarse-grained approach is. This paper evaluates the accuracy of a coarse-grained clone detector compared with some fine-grained clone detectors. Our experiment provides an empirical evidence about acceptable accuracy of such a coarse-grained approach. Thus, we conclude that coarse-grained detection is adequate to make a summary of clone analysis and to be a starter of detailed analysis including manual inspections and bug detection.

[1]  R. Koschke,et al.  Frontiers of software clone management , 2008, 2008 Frontiers of Software Maintenance.

[2]  Nils Göde,et al.  Cloned code: stable code , 2013, J. Softw. Evol. Process..

[3]  C. Roy,et al.  The Road to Software Clone Management: A Survey , 2012 .

[4]  Shinji Kusumoto,et al.  Inter-Project Functional Clone Detection Toward Building Libraries - An Empirical Study on 13,000 Projects , 2012, 2012 19th Working Conference on Reverse Engineering.

[5]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[6]  Elmar Jürgens,et al.  Index-based code clone detection: incremental, distributed, scalable , 2010, 2010 IEEE International Conference on Software Maintenance.

[7]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[8]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[11]  Rainer Koschke,et al.  Large‐scale inter‐system clone detection using suffix trees and hashing , 2014, J. Softw. Evol. Process..

[12]  Oscar Nierstrasz,et al.  On the effectiveness of clone detection by string matching , 2006, J. Softw. Maintenance Res. Pract..

[13]  Chanchal Kumar Roy,et al.  An automatic framework for extracting and classifying near-miss clone genealogies , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[14]  Chanchal Kumar Roy,et al.  Scaling classical clone detection tools for ultra-large datasets: An exploratory study , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[15]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[16]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[17]  Saman Bazrafshan,et al.  Evolution of Near-Miss Clones , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[18]  Iman Keivanloo,et al.  Internet-scale Real-time Code Clone Search Via Multi-level Indexing , 2011, 2011 18th Working Conference on Reverse Engineering.

[19]  Chanchal Kumar Roy,et al.  Shuffling and randomization for scalable source code clone detection , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[20]  Nils Göde,et al.  Evolution of Type-1 Clones , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[21]  Michel Wermelinger,et al.  Assessing the effect of clones on changeability , 2008, 2008 IEEE International Conference on Software Maintenance.

[22]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[23]  Rainer Koschke,et al.  Frequency and risks of changes to clones , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[24]  Aiko Fallas Yamashita,et al.  Do developers care about code smells? An exploratory survey , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[25]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[26]  Maninder Singh,et al.  Software clone detection: A systematic review , 2013, Inf. Softw. Technol..

[27]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[28]  Mark Harman,et al.  Searching for better configurations: a rigorous approach to clone evaluation , 2013, ESEC/FSE 2013.

[29]  Hagen Hagen Is Cloned Code more stable than Non-Cloned Code? , 2008 .

[30]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[31]  Rainer Koschke,et al.  Incremental Clone Detection , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[32]  Michael W. Godfrey,et al.  We have all of the clones, now what? Toward integrating clone analysis into software quality assessment , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[33]  Shinji Kusumoto,et al.  Gapped code clone detection with lightweight source code analysis , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[34]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[35]  Katsuro Inoue,et al.  Finding file clones in FreeBSD Ports Collection , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[36]  Cristina V. Lopes,et al.  File cloning in open source Java projects: The good, the bad, and the ugly , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[37]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[38]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[39]  Shinji Kusumoto,et al.  Is duplicate code more frequently modified than non-duplicate code in software evolution?: an empirical study on open source software , 2010, IWPSE-EVOL '10.

[40]  Shinji Kusumoto,et al.  Enhancement of CRD-based clone tracking , 2013, IWPSE 2013.

[41]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[42]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).