Examining the effectiveness of using concolic analysis to detect code clones

During the initial construction and subsequent maintenance of an application, duplication of functionality is common, whether intentional or otherwise. This replicated functionality, known as a code clone, has a diverse set of causes and can have moderate to severe adverse effects on a software project in a variety of ways. A code clone is defined as multiple code fragments that produce similar results when provided the same input. While there is an array of powerful clone detection tools, most suffer from a variety of drawbacks including, most importantly, the inability to accurately and reliably detect the more difficult clone types. This paper presents a new technique for detecting code clones based on concolic analysis, which uses a mixture of concrete and symbolic values to traverse a large and diverse portion of the source code. By performing concolic analysis on the targeted source code and then examining the holistic output for similarities, code clone candidates can be consistently identified. We found that concolic analysis was able to accurately and reliably discover all four types of code clones with an average precision of .8, recall of .91, F-score of .85 and an accuracy of .99.

[1]  Shinji Kusumoto,et al.  A dataset of clone references with gaps , 2014, MSR 2014.

[2]  Emad Shihab,et al.  CCCD: Concolic code clone detection , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[3]  Penny Grubb,et al.  Software Maintenance: Concepts and Practice , 2003 .

[4]  Elmar Jürgens,et al.  Do code clones matter? , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[5]  Wei Le,et al.  A code clone oracle , 2014, MSR 2014.

[6]  Martin P. Robillard,et al.  Clone region descriptors: Representing and tracking duplication in source code , 2010, TSEM.

[7]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[8]  Montserrat Ros,et al.  A post-compilation register reassignment technique for improving hamming distance code compression , 2005, CASES '05.

[9]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[10]  Francisco J. Mitropoulos,et al.  Code clone discovery based on functional behavior , 2012 .

[11]  Chanchal Kumar Roy,et al.  Java bytecode clone detection via relaxation on code fingerprint and Semantic Web reasoning , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[12]  K. H. Bennett,et al.  Journal of software maintenance : research and practice , 1989 .

[13]  Yun Yang,et al.  Problems creating task-relevant clone detection reference data , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[14]  Ewan D. Tempero,et al.  Towards a curated collection of code clones , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[15]  Chanchal Kumar Roy,et al.  Scaling classical clone detection tools for ultra-large datasets: An exploratory study , 2013, 2013 7th International Workshop on Software Clones (IWSC).

[16]  Hiroshi Inamura,et al.  Dynamic test input generation for web applications , 2008, ISSTA '08.

[17]  L. Sridevi,et al.  Clone Detection Using Abstract Syntax Trees , 2016 .

[18]  Francesca Arcelli Fontana,et al.  Software Clone Detection and Refactoring , 2013 .

[19]  Heejung Kim,et al.  MeCC: memory comparison-based clone detector , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[20]  Bernhard Schätz,et al.  Model clone detection in practice , 2010, IWSC '10.

[21]  Dongmei Zhang,et al.  XIAO: tuning code clones at hands of engineers in practice , 2012, ACSAC '12.

[22]  Chanchal Kumar Roy,et al.  IDE-based real-time focused search for near-miss clones , 2012, SAC '12.

[23]  Patrick Gros,et al.  Hamming embedding similarity-based image classification , 2012, ICMR.

[24]  Thierry Lavoie,et al.  Automated type-3 clone oracle using Levenshtein metric , 2011, IWSC '11.

[25]  Rao Li,et al.  A space efficient algorithm for the constrained heaviest common subsequence problem , 2008, ACM-SE 46.

[26]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[27]  Elmar Jürgens,et al.  Code clone detection in practice , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[28]  Miryung Kim,et al.  An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[29]  Chanchal Kumar Roy,et al.  The vision of software clone management: Past, present, and future (Keynote paper) , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).