An Automatic Source Code Vulnerability Detection Approach Based on KELM

Traditional vulnerability detection mostly ran on rules or source code similarity with manually defined vulnerability features. In fact, these vulnerability rules or features are difficult to be defined accurately, which usually cost much expert labor and perform weakly in practical applications. To mitigate this issue, researchers introduced neural networks to automatically extract features to improve the intelligence of vulnerability detection. Bidirectional Long Short-term Memory (Bi-LSTM) network has proved a success for software vulnerability detection. However, due to complex context information processing and iterative training mechanism, training cost is heavy for Bi-LSTM. To effectively improve the training efficiency, we proposed to use Extreme Learning Machine (ELM). -e training process of ELM is noniterative, so the network training can converge quickly. As ELM usually shows weak precision performance because of its simple network structure, we introduce the kernel method. In the preprocessing of this framework, we introduce doc2vec for vector representation and multilevel symbolization for program symbolization. Experimental results show that doc2vec vector representation brings faster training and better generalizing performance than word2vec. ELM converges much quickly than Bi-LSTM, and the kernel method can effectively improve the precision of ELM while ensuring training efficiency.

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Shouhuai Xu,et al.  SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities , 2018, IEEE Transactions on Dependable and Secure Computing.

[3]  Shu Zhan,et al.  Robust face detection using local CNN and SVM based on kernel combination , 2016, Neurocomputing.

[4]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[5]  Shouhuai Xu,et al.  VulDeeLocator: A Deep Learning-based Fine-grained Vulnerability Detector , 2020, ArXiv.

[6]  Sang Peter Chin,et al.  Automated software vulnerability detection with machine learning , 2018, ArXiv.

[7]  Konrad Rieck,et al.  Modeling and Discovering Vulnerabilities with Code Property Graphs , 2014, 2014 IEEE Symposium on Security and Privacy.

[8]  Robert H. Deng,et al.  VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples , 2017, ESORICS.

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Minghua Xia,et al.  An Efficient Hierarchical Identification Method With Kernel-Based SVM for Equivalent Systems of Aircrafts , 2019, IEEE Access.

[11]  Shouhuai Xu,et al.  VulPecker: an automated vulnerability detection system based on code similarity analysis , 2016, ACSAC.

[12]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[13]  Xi Zhang,et al.  The Coming Era of AlphaHacking?: A Survey of Automatic Software Vulnerability Detection, Exploitation and Patching Techniques , 2018, 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC).

[14]  A. Savchenko,et al.  DeeDP: vulnerability detection and patching based on deep learning , 2020 .

[15]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[16]  Jinglu Hu,et al.  Large-scale image classification using fast SVM with deep quasi-linear kernel , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[17]  Wei Li,et al.  Hyperspectral image classification by AdaBoost weighted composite kernel extreme learning machines , 2018, Neurocomputing.

[18]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Rakesh M. Verma,et al.  Machine Learning Methods for Software Vulnerability Detection , 2018, IWSPA@CODASPY.

[20]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[21]  Bo Wang,et al.  Kernel Extreme Learning Machine for Learning from Label Proportions , 2018, ICCS.

[22]  Guillermo L. Grinblat,et al.  Toward Large-Scale Vulnerability Discovery using Machine Learning , 2016, CODASPY.

[23]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[24]  Onur Ozdemir,et al.  Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[25]  Lior Wolf,et al.  Joint word2vec Networks for Bilingual Semantic Representations , 2014, Int. J. Comput. Linguistics Appl..

[26]  Huiqiang Wang,et al.  A Comparative Study of Neural Network Techniques for Automatic Software Vulnerability Detection , 2020, 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE).

[27]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[28]  Hai Jin,et al.  A Comparative Study of Deep Learning-Based Vulnerability Detection System , 2019, IEEE Access.

[29]  Marcus Pendleton,et al.  A Survey on Systems Security Metrics , 2016, ACM Comput. Surv..