A Conceptual Replication on Predicting the Severity of Software Vulnerabilities

Software vulnerabilities may lead to crucial security risks in software systems. Thus, prioritization of the vulnerabilities is an important task for security teams, and assessing how severe the vulnerabilities are would help teams during fixing and maintenance activities. We replicated a prior work which aims to predict the severity of software vulnerabilities by grouping vulnerabilities into different severity levels. We follow their approach on feature extraction using word embeddings, and on prediction model using Convolutional Neural Networks (CNNs). In addition, Long Short Term Memory (LSTM) and Extreme Gradient Boosting (XGBoost) models are used. We also extend the replicated work by aiming to predict severity scores rather than levels. We carried out two experiments for predicting severity levels and severity scores of 82,974 vulnerabilities. On predicting the severity levels, our LSTM and CNN models perform similarly with an F1 score of 0.756 F1 score and 0.752, respectively. On predicting the severity scores, LSTM, CNN and XGBoost models perform 16.14%, 17.03%, 18.91% MAPE values, respectively.

[1]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[2]  Zhenchang Xing,et al.  Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[3]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  Lefteris Angelis,et al.  Assessment of Vulnerability Severity using Text Mining , 2017, PCI.

[6]  Viet Hung Nguyen,et al.  Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[7]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[8]  Andrew Meneely,et al.  Analyzing Security Data , 2015, The Art and Science of Analyzing Software Data.

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.