Automated Characterization of Software Vulnerabilities

Preventing vulnerability exploits is a critical software maintenance task, and software engineers often rely on Common Vulnerability and Exposure (CVEs) reports for information about vulnerable systems and libraries. These reports include descriptions, disclosure sources, and manually-populated vulnerability characteristics such as root cause from the NIST Vulnerability Description Ontology (VDO). This information needs to be complete and accurate so stakeholders of affected products can prevent and react to exploits of the reported vulnerabilities. In this study, we demonstrate that VDO characteristics can be automatically detected from the textual descriptions included in CVE reports. We evaluated the performance of 6 classification algorithms with a dataset of 365 vulnerability descriptions, each mapped to 1 of 19 characteristics from the VDO. This work demonstrates that it is feasible to train classification techniques to accurately characterize vulnerabilities from their descriptions. All 6 classifiers evaluated produced accurate results, and the Support Vector Machine classifier was the best-performing individual classifier. Automating the vulnerability characterization process is a step towards ensuring stakeholders have the necessary data to effectively maintain their systems.

[1]  Gregory A. Witte,et al.  The National Vulnerability Database (NVD): Overview | NIST , 2013 .

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  James Andrew Ozment,et al.  Vulnerability discovery & software security , 2007 .

[4]  Mehdi Mirakhorli,et al.  Architectural Security Weaknesses in Industrial Control Systems (ICS) an Empirical Study Based on Disclosed Software Vulnerabilities , 2019, 2019 IEEE International Conference on Software Architecture (ICSA).

[5]  Timothy W. Finin,et al.  Extracting Cybersecurity Related Linked Data from Text , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[6]  Karen Scarfone,et al.  Common Vulnerability Scoring System , 2006, IEEE Security & Privacy.

[7]  Doina Caragea,et al.  Predicting Cyber Risks through National Vulnerability Database , 2015, Inf. Secur. J. A Glob. Perspect..

[8]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[9]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[10]  Timothy W. Finin,et al.  A Knowledge-Based Approach to Intrusion Detection Modeling , 2012, 2012 IEEE Symposium on Security and Privacy Workshops.

[11]  Katy Tarrit,et al.  A Catalog of Security Architecture Weaknesses , 2017, 2017 IEEE International Conference on Software Architecture Workshops (ICSAW).

[12]  George Forman,et al.  Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement , 2010, SKDD.

[13]  Michel Edkrantz,et al.  Predicting Exploit Likelihood for Cyber Vulnerabilities with Machine Learning , 2015 .

[14]  Fabio Massacci,et al.  Which is the right source for vulnerability studies?: an empirical analysis on Mozilla Firefox , 2010, MetriSec '10.