Convolutional neural networks on assembly code for predicting software defects

Software defect prediction is one of the most attractive research topics in the field of software engineering. The task is to predict whether or not a program contains semantic bugs. Previous studies apply conventional machine learning techniques on software metrics, or deep learning on source code's tree representations called abstract syntax trees. This paper formulates an approach for software defect prediction, in which source code firstly is compiled into assembly code and then a multi-view convolutional neural network is applied to automatically learn defect features from the assembly instruction sequences. The experimental results on four real-world datasets indicate that exploiting assembly code is beneficial to detecting semantic bugs. Our approach significantly outperforms baselines that are based on software metrics and abstract syntax trees.

[1]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[2]  Tetsuro Nishino,et al.  A source code plagiarism detecting method using alignment with abstract syntax tree elements , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[3]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[4]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[5]  Song Wang,et al.  Automatically Learning Semantic Features for Defect Prediction , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[6]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[7]  José Javier Dolado,et al.  Preliminary comparison of techniques for dealing with imbalance in software defect prediction , 2014, EASE '14.

[8]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[9]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[10]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[11]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[12]  Minh-Le Nguyen,et al.  Exploiting tree structures for classifying programs by functionalities , 2016, 2016 Eighth International Conference on Knowledge and Systems Engineering (KSE).

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[15]  Maria Filipa Mourão,et al.  Strengths and Weaknesses of Three Software Programs for the Comparison of Systems Based on ROC Curves , 2016, ICCSA.

[16]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[17]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[18]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[19]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[20]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[21]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[22]  Iker Gondra,et al.  Applying machine learning to software fault-proneness prediction , 2008, J. Syst. Softw..