Joint Prediction of Multiple Vulnerability Characteristics Through Multi-Task Learning

Software vulnerabilities seriously affect the security of computing systems and they are continuously disclosed and reported. When documenting software vulnerabilities, characterizing the severity, exploitability and impact of a vulnerability is critical for effective triaging and management of software vulnerabilities. Faced with ever-growing number of new vulnerabilities, we observe a significant lag between the disclosure of a vulnerability and the specification of its characteristics. This lag calls for automated, reliable assessment of vulnerability characteristics to assist security analysts in allocating their limited efforts to potentially most serious vulnerabilities. Existing automated techniques for vulnerability assessment require hand-crafted features and balanced data, and consider each specific characteristic independently at a time. In this paper, we propose a multi-task machine learning approach for the joint prediction of multiple vulnerability characteristics based on the vulnerability descriptions. Our approach gets rid of the requirement of balanced data, and it relies on neural networks that learn to extract features from training data. Using the large-scale vulnerability data in the Common Vulnerabilities and Exposure(CVE) database, we conduct extensive experiments to compare different configurations of neural network feature extractors, study the impact of multi-task learning versus independent-task learning, and investigate the performance of our approach for predicting the characteristics of newly disclosed vulnerabilities and the minimum requirement of historical vulnerability data for training reliable prediction model.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Zhenchang Xing,et al.  Predicting semantically linkable knowledge in developer online forums via convolutional neural network , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3]  Mohammad Zulkernine,et al.  Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities , 2011, J. Syst. Archit..

[4]  Zhenchang Xing,et al.  Learning a dual-language vector space for domain-specific cross-lingual question retrieval , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[6]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[7]  Rahmi Khoirani Common Vulnerability and Exposures (CVE) , 2018 .

[8]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Sushil Jajodia,et al.  An Attack Graph-Based Probabilistic Security Metric , 2008, DBSec.

[12]  Zhenchang Xing,et al.  Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[13]  Jing Li,et al.  Learning to answer programming questions with software documentation through social context embedding , 2018, Inf. Sci..

[14]  S. Radack The Common Vulnerability Scoring System (CVSS) , 2007 .

[15]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[16]  Yaohui Jin,et al.  A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning , 2017, IJCAI.

[17]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[18]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[19]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[20]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[21]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[22]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[23]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[24]  Zhenchang Xing,et al.  DeepWeak: Reasoning common software weaknesses via knowledge graph embedding , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[25]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.