Automatic Classification Method for Software Vulnerability Based on Deep Neural Network

Software vulnerabilities are the root causes of various security risks. Once a vulnerability is exploited by malicious attacks, it will greatly compromise the safety of the system, and may even cause catastrophic losses. Hence automatic classification methods are desirable to effectively manage the vulnerability in software, improve the security performance of the system, and reduce the risk of the system being attacked and damaged. In this paper, a new automatic vulnerability classification model (TFI-DNN) has been proposed. The model is built upon term frequency-inverse document frequency (TF-IDF), information gain (IG), and deep neural network (DNN): the TF-IDF is used to calculate the frequency and weight of each word from vulnerability description; the IG is used for feature selection to obtain an optimal set of feature word, and; the DNN neural network model is used to construct an automatic vulnerability classifier to achieve effective vulnerability classification. The National Vulnerability Database of the United States has been used to validate the effectiveness of the proposed model. Compared to SVM, Naive Bayes, and KNN, the TFI-DNN model has achieved better performance in multi-dimensional evaluation indexes including accuracy, recall rate, precision, and F1-score.

[1]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[2]  Dennis Hollingworth,et al.  Protection Analysis: Final Report , 1978 .

[3]  Fang Wu,et al.  Vulnerability detection with deep learning , 2017, 2017 3rd IEEE International Conference on Computer and Communications (ICCC).

[4]  Christoph Meinel,et al.  Automatic Vulnerability Classification Using Machine Learning , 2017, CRiSIS.

[5]  Sang Peter Chin,et al.  Automated software vulnerability detection with machine learning , 2018, ArXiv.

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Shuo Wang,et al.  Overview of deep learning , 2016, 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC).

[8]  Indrajit Ray,et al.  Risks and Security of Internet and Systems , 2014, Lecture Notes in Computer Science.

[9]  Haifeng Li,et al.  Automatic classification for vulnerability based on machine learning , 2013, 2013 IEEE International Conference on Information and Automation (ICIA).

[10]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Haitao Liu,et al.  An improved KNN text classification algorithm based on density , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[13]  Milos Manic,et al.  Vulnerability identification and classification via text mining bug databases , 2014, IECON 2014 - 40th Annual Conference of the IEEE Industrial Electronics Society.

[14]  Ee-Peng Lim,et al.  On strategies for imbalanced text classification using SVM: A comparative study , 2009, Decis. Support Syst..

[15]  Geoffrey Zweig,et al.  Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Andy Gray,et al.  An historical perspective of software vulnerability management , 2003, Inf. Secur. Tech. Rep..

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Yao Zhang,et al.  A Robust Text Classifier Based on Denoising Deep Neural Network in the Analysis of Big Data , 2017, Sci. Program..

[20]  Jeng-Shyang Pan,et al.  Deep convolutional neural networks-based age and gender classification with facial images , 2017, 2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS).

[21]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[22]  Pan-jun Kim An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning , 2018 .

[23]  Byoung-Tak Zhang,et al.  Large-Scale Text Classification with Deep Neural Networks , 2017 .

[24]  Taeeun Kim,et al.  A Study on the Classification of Common Vulnerabilities and Exposures using Naïve Bayes , 2016, BWCCA.

[25]  Changzhen Hu,et al.  An Automatic Vulnerabilities Classification Method Based on Their Relevance , 2017, NSS.

[26]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.