Code-to-Code Search Based on Deep Neural Network and Code Mutation

Deep Neural Networks (DNNs) have been often used for the labeling of image files (e.g., object detection). Although they can be applied for the labeling of code fragment (i.e., code-to-code search) in software engineering, a large number of code fragments are required for each label in the learning process of DNNs. In this paper, we present an approach for code-to-code search based on a DNN model and code mutation for generating enough number of code fragments for each label. The preliminary experiment shows high precision and recall of the proposed approach.

[1]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[2]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[3]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Shinji Kusumoto,et al.  Simultaneous Modification Support based on Code Clone Analysis , 2007, 14th Asia-Pacific Software Engineering Conference (APSEC'07).

[5]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[6]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[7]  Katsuro Inoue,et al.  Finding similar defects using synonymous identifier retrieval , 2010, IWSC '10.

[8]  Katsuro Inoue,et al.  Where does this code come from and where does it go? — Integrated code history tracker for open source systems , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[9]  Katsuro Inoue,et al.  Investigating Vector-Based Detection of Code Clones Using BigCloneBench , 2018, 2018 25th Asia-Pacific Software Engineering Conference (APSEC).

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Jacques Klein,et al.  FaCoY – A Code-to-Code Search Engine , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[12]  Gang Zhao,et al.  DeepSim: deep learning code functional similarity , 2018, ESEC/SIGSOFT FSE.

[13]  Cristina V. Lopes,et al.  Oreo: detection of clones in the twilight zone , 2018, ESEC/SIGSOFT FSE.

[14]  Chanchal Kumar Roy,et al.  Towards a Big Data Curated Benchmark of Inter-project Code Clones , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[15]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[16]  Ming Li,et al.  Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code , 2017, IJCAI.

[17]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).