Challenges in Behavioral Code Clone Detection

When software engineering researchers discuss "similar" code, we often mean code determined by static analysis to be textually, syntactically or structurally similar, known as code clones (looks alike). Ideally, we would like to also include code that is behaviorally or functionally similar, even if it looks completely different. The state of the art in detecting these behavioral clones focuses on checking the functional equivalence of the inputs and outputs of code fragments, regardless of its internal behavior (focusing only on input and output states). We argue that with an advance in dynamic code clone detection towards detecting behavioral clones (i.e., those with similar execution behavior), we can greatly increase the applications of behavioral clones as a whole for general program understanding tasks.

[1]  Elmar Jürgens,et al.  Code Similarities Beyond Copy & Paste , 2010, 2010 14th European Conference on Software Maintenance and Reengineering.

[2]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[3]  Giuliano Antoniol,et al.  Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[4]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[5]  Emad Shihab,et al.  CCCD: Concolic code clone detection , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[6]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[7]  Theo D'Hondt,et al.  Behavioral similarity matching using concrete source code templates in logic queries , 2007, PEPM '07.

[8]  L. Sridevi,et al.  Clone Detection Using Abstract Syntax Trees , 2016 .

[9]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[10]  Gary T. Leavens,et al.  Semantic clone detection using method IOE-behavior , 2012, 2012 6th International Workshop on Software Clones (IWSC).

[11]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[12]  Rainer Koschke,et al.  On the Comprehension of Program Comprehension , 2014, TSEM.

[13]  Robert DeLine,et al.  Information Needs in Collocated Software Development Teams , 2007, 29th International Conference on Software Engineering (ICSE'07).

[14]  Abraham Bernstein,et al.  Detecting similar Java classes using tree algorithms , 2006, MSR '06.

[15]  Simha Sethumadhavan,et al.  Approximate graph clustering for program characterization , 2012, TACO.

[16]  Kathryn T. Stolee,et al.  Repairing Programs with Semantic Code Search , 2015 .

[17]  Yuriy Brun,et al.  Repairing Programs with Semantic Code Search (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[18]  Stefan Wagner,et al.  Challenges of the Dynamic Detection of Functionally Similar Code Fragments , 2012, 2012 16th European Conference on Software Maintenance and Reengineering.

[19]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.