Semantic Code Clone Detection Via Event Embedding Tree and GAT Network

Semantic code clone detection is an important yet challenging task in software engineering. Traditional methods rely on expert experience and cannot automatically determine which features are better for semantic code clone detection. Moreover, the program dynamics (e.g., the execution characteristics and execution order of statements) are not considered in these methods. As a result, this limits their ability to detect semantic clones. To address this issue, we propose a code clone detection method based on event embedding tree and Graph Attention Network. Our method uses a program control flow graph to capture the execution characteristics of each statement and extract the context relationship of different statements in the control flow. Based on such information, our method can calculate the functional similarity of two pieces of code, thereby identifying semantically similar code fragments. Experimental results show that our method is superior to state-of-the-art open source methods for Type-3 (syntactic) / Type-4 (semantic) clone detection.

[1]  Jens Krinke,et al.  Using compilation/decompilation to enhance clone detection , 2017, 2017 IEEE 11th International Workshop on Software Clones (IWSC).

[2]  Heejo Lee,et al.  Software systems at risk: An empirical study of cloned vulnerabilities in practice , 2018, Comput. Secur..

[3]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[4]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[5]  Gang Zhao,et al.  DeepSim: deep learning code functional similarity , 2018, ESEC/SIGSOFT FSE.

[6]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[7]  Jan Kollar,et al.  Haskell clone detection using pattern comparing algorithm , 2015, 2015 13th International Conference on Engineering of Modern Electric Systems (EMES).

[8]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[9]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[10]  Brenda S. Baker,et al.  A Program for Identifying Duplicated Code , 1992 .

[11]  Farooque Azam,et al.  A Systematic Review on Code Clone Detection , 2019, IEEE Access.

[12]  Ming Li,et al.  Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code , 2017, IJCAI.

[13]  Zhendong Su,et al.  Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[14]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[15]  Chanchal Kumar Roy,et al.  NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[16]  Chanchal Kumar Roy,et al.  Towards a Big Data Curated Benchmark of Inter-project Code Clones , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[17]  Manziba Akanda Nishi,et al.  Scalable code clone detection and search based on adaptive prefix filtering , 2018, J. Syst. Softw..

[18]  J. Howard Johnson,et al.  Substring matching for clone detection and change tracking , 1994, Proceedings 1994 International Conference on Software Maintenance.

[19]  Chanchal Kumar Roy,et al.  Fast and Flexible Large-Scale Clone Detection with CloneWorks , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[20]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[21]  Min Wang,et al.  CCSharp: An Efficient Three-Phase Code Clone Detector Using Modified PDGs , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[22]  Yasutaka Kamei,et al.  Assessing the Differences of Clone Detection Methods Used in the Fault-Prone Module Prediction , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[23]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[24]  Rainer Koschke,et al.  Clone Detection Using Abstract Syntax Suffix Trees , 2006, 2006 13th Working Conference on Reverse Engineering.

[25]  Cristina V. Lopes,et al.  Oreo: detection of clones in the twilight zone , 2018, ESEC/SIGSOFT FSE.

[26]  Barbara G. Ryder,et al.  CCLearner: A Deep Learning-Based Clone Detection Approach , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[27]  Cristina V. Lopes,et al.  SourcererCC: Scaling Code Clone Detection to Big-Code , 2015, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[28]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.