Code Similarities Beyond Copy & Paste

Redundant source code hinders software maintenance, since updates have to be performed in multiple places. This holds independent of whether redundancy was created by copy&paste or by independent development of behaviorally similar code. Existing clone detection tools successfully discover syntactically similar redundant code. They thus work well for redundancy that has been created by copy&paste. But: how syntactically similar is behaviorally similar code of independent origin? This paper presents the results of a controlled experiment that demonstrates that behaviorally similar code of independent origin is highly unlikely to be syntactically similar. In fact, it is so syntactically different, that existing clone detection approaches cannot identify more than 1% of such redundancy. This is unfortunate, as manual inspections of open source software indicate that behaviorally similar code of independent origin does exist in practice and does present problems to maintenance.

[1]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[2]  Andrew Walenstein Code Clones: Reconsidering Terminology , 2006, Duplication, Redundancy, and Similarity in Software.

[3]  Elmar Jürgens,et al.  Tool Support for Continuous Quality Control , 2008, IEEE Software.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[6]  Hoan Anh Nguyen,et al.  Graph-based mining of multiple object usage patterns , 2009, ESEC/FSE '09.

[7]  David F. McAllister,et al.  An Experimental Evaluation of Software Redundancy as a Strategy For Improving Reliability , 1991, IEEE Trans. Software Eng..

[8]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[9]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[10]  Tom Mens,et al.  A survey of software refactoring , 2004, IEEE Transactions on Software Engineering.

[11]  Linda M. Wills,et al.  Flexible control for program recognition , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[12]  Elmar Jürgens,et al.  CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[13]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[14]  Renato De Mori,et al.  Pattern matching for clone and concept detection , 2004, Automated Software Engineering.

[15]  Iu. I. Ianov,et al.  On the equivalence and transformation of program schemes , 1958, CACM.

[16]  Zhendong Su,et al.  Context-based detection of clone-related bugs , 2007, ESEC-FSE '07.

[17]  Yun Yang,et al.  Problems creating task-relevant clone detection reference data , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[18]  Michael W. Godfrey,et al.  Cloning by accident: an empirical study of source code cloning across software systems , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[19]  Denis Barthou,et al.  Algorithm recognition based on demand-driven data-.ow analysis , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[20]  Florian Deissenboeck,et al.  Clone Detection Beyond Copy & Paste , 2009 .

[21]  Mohammad El-Ramly,et al.  Similarity in Programs , 2006, Duplication, Redundancy, and Similarity in Software.

[22]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[23]  Stephan Merz,et al.  Model Checking , 2000 .

[24]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Martin P. Robillard,et al.  Improving API Usage through Automatic Detection of Redundant Code , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[26]  Rainer Koschke,et al.  Survey of Research on Software Clones , 2006, Duplication, Redundancy, and Similarity in Software.

[27]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).