Comparison and evaluation of code clone detection techniques and tools

Over the last decade many techniques and tools for software clone detection have been proposed. In this paper, we provide a qualitative comparison and evaluation of the current state-of-the-art in ...

[1]  Chanchal Kumar Roy,et al.  An Empirical Study of Function Clones in Open Source Software , 2008, 2008 15th Working Conference on Reverse Engineering.

[2]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[3]  Neil Davey,et al.  The development of a software clone detector , 1995 .

[4]  S. Rao Kosaraju Faster Algorithms for the Construction of Parameterized Suffix Trees (Preliminary Version) , 1995, FOCS.

[5]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[6]  Massimiliano Di Penta,et al.  Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages , 2001 .

[7]  Rainer Koschke Identifying and Removing Software Clones , 2008, Software Evolution.

[8]  Ying Zou,et al.  Detecting Clones in Business Applications , 2008, 2008 15th Working Conference on Reverse Engineering.

[9]  Christopher W. Pidgeon,et al.  DMS®: Program Transformations for Practical Scalable Software Evolution , 2002, IWPSE '02.

[10]  Giuliano Antoniol,et al.  Linear complexity object-oriented similarity for clone detection and software evolution analyses , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[11]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[12]  Daqing Hou,et al.  CReN: a tool for tracking copy-and-paste code clones and renaming identifiers consistently in the IDE , 2007, eclipse '07.

[13]  Michel Dagenais,et al.  Extending software quality assessment techniques to Java systems , 1999, Proceedings Seventh International Workshop on Program Comprehension.

[14]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[15]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[16]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[17]  Dietmar Seipel,et al.  Clone detection in source code by frequent itemset techniques , 2004 .

[18]  M. Di Penta,et al.  Identifying clones in the Linux kernel , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.

[19]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[20]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[21]  Magdalena Balazinska,et al.  Measuring clone based reengineering opportunities , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[22]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[23]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[24]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[25]  Brenda S. Baker,et al.  Finding Clones with Dup: Analysis of an Experiment , 2007, IEEE Transactions on Software Engineering.

[26]  Brenda S. Baker Parameterized Pattern Matching: Algorithms and Applications , 1996, J. Comput. Syst. Sci..

[27]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[28]  Jeffrey G. Gray,et al.  Phoenix-based clone detection using suffix trees , 2006, ACM-SE 44.

[29]  Serge Demeyer,et al.  Evaluating clone detection techniques from a refactoring perspective , 2004 .

[30]  Elizabeth Burd,et al.  Evaluating clone detection tools for use during preventative maintenance , 2002, Proceedings. Second IEEE International Workshop on Source Code Analysis and Manipulation.

[31]  Giuliano Antoniol,et al.  Investigating large software system evolution: the Linux kernel , 2002, Proceedings 26th Annual International Computer Software and Applications.

[32]  Christopher W. Fraser,et al.  Clone detection via structural abstraction , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[33]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[34]  Giuseppe Scanniello,et al.  Understanding cloned patterns in Web applications , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[35]  Koen De Bosschere,et al.  Sifting out the mud: low level C++ code reuse , 2002, OOPSLA '02.

[36]  Shinji Kusumoto,et al.  On detection of gapped code clones using gap locations , 2002, Ninth Asia-Pacific Software Engineering Conference, 2002..

[37]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[38]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[39]  James R. Cordy,et al.  The TXL source transformation language , 2006, Sci. Comput. Program..

[40]  Hajimu Iida,et al.  SHINOBI: A real-time code clone detection tool for software maintenance , 2008 .

[41]  Christopher W. Fraser,et al.  Analyzing and compressing assembly code , 1984, SIGPLAN '84.

[42]  Peter E. Bulychev,et al.  Duplicate code detection using anti-unification , 2008 .

[43]  Seunghak Lee,et al.  SDD: high performance code clone detection system for large scale source code , 2005, OOPSLA '05.

[44]  Massimiliano Di Penta,et al.  An approach to identify duplicated web pages , 2002, Proceedings 26th Annual International Computer Software and Applications.

[45]  Kostas Kontogiannis,et al.  Evaluation experiments on the detection of programming patterns using software metrics , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[46]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[47]  Rainer Koschke,et al.  Vergleich von Techniken zur Erkennung duplizierten Quellcodes , 2002 .

[48]  Giuliano Antoniol,et al.  Analyzing cloning evolution in the Linux kernel , 2002, Inf. Softw. Technol..

[49]  Huiqing Li,et al.  Clone detection and removal for Erlang/OTP within a refactoring environment , 2009, PEPM '09.

[50]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[51]  Giuliano Antoniol,et al.  Modeling clones evolution through time series , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[52]  Zhiyi Ma,et al.  Detecting Duplications in Sequence Diagrams Based on Suffix Trees , 2006, 2006 13th Asia Pacific Software Engineering Conference (APSEC'06).

[53]  Maziar Gomrokchi,et al.  Source code enhancement using reduction of duplicated code , 2007 .

[54]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[55]  Filippo Lanubile,et al.  Finding function clones in Web applications , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[56]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[57]  Filippo Lanubile,et al.  Function Clone Detection in Web Applications: A Semiautomated Approach , 2004, J. Web Eng..

[58]  Chanchal Kumar Roy,et al.  Scenario-Based Comparison of Clone Detection Techniques , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[59]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.

[60]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[61]  Matthias Rieger,et al.  Effective Clone Detection Without Language Barriers , 2005 .

[62]  J. Howard Johnson,et al.  Substring matching for clone detection and change tracking , 1994, Proceedings 1994 International Conference on Software Maintenance.

[63]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[64]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[65]  Paolo Tonella,et al.  Using clustering to support the migration from static to dynamic web pages , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[66]  Stan Jarzabek,et al.  Using Server Pages to Unify Clones in Web Applications: A Trade-Off Analysis , 2007, 29th International Conference on Software Engineering (ICSE'07).

[67]  Radu Marinescu,et al.  Archeology of code duplication: recovering duplication chains from small duplication fragments , 2005, Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC'05).

[68]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[69]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[70]  Shinji Kusumoto,et al.  On Software Maintenance Process Improvement Based on Code Clone Analysis , 2002, PROFES.

[71]  Rainer Koschke,et al.  Empirical evaluation of clone detection using syntax suffix trees , 2008, Empirical Software Engineering.

[72]  Michael W. Godfrey,et al.  Aiding comprehension of cloning through categorization , 2004 .

[73]  Raffaele Giancarlo,et al.  Sparse Dynamic Programming for Longest Common Subsequence from Fragments , 2002, J. Algorithms.

[74]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.