Proactive identification of exploits in the wild through vulnerability mentions online

The number of software vulnerabilities discovered and publicly disclosed is increasing every year; however, only a small fraction of them is exploited in real-world attacks. With limitations on time and skilled resources, organizations often look at ways to identify threatened vulnerabilities for patch prioritization. In this paper, we present an exploit prediction model that predicts whether a vulnerability will be exploited. Our proposed model leverages data from a variety of online data sources (white-hat community, vulnerability researchers community, and darkweb/deepweb sites) with vulnerability mentions. Compared to the standard scoring system (CVSS base score), our model outperforms the baseline models with an F1 measure of 0.40 on the minority class (266% improvement over CVSS base score) and also achieves high True Positive Rate at low False Positive Rate (90%, 13%, respectively). The results demonstrate that the model is highly effective as an early predictor of exploits that could appear in the wild. We also present a qualitative and quantitative study regarding the increase in the likelihood of exploitation incurred when a vulnerability is mentioned in each of the data sources we examine.

[1]  Fabio Massacci,et al.  Comparing Vulnerability Severity and Exploits Using Case-Control Studies , 2014, TSEC.

[2]  Christopher L. Smith,et al.  Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data , 2017, IWSPA@CODASPY.

[3]  T. Holt,et al.  Exploring stolen data markets online: products and market forces , 2010 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Stefan Savage,et al.  An analysis of underground forums , 2011, IMC '11.

[6]  Alan Said,et al.  Predicting Cyber Vulnerability Exploits with Machine Learning , 2015, Scandinavian Conference on AI.

[7]  Tudor Dumitras,et al.  Some Vulnerabilities Are Different Than Others - Studying Vulnerabilities and Attack Surfaces in the Wild , 2014, RAID.

[8]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[9]  Paulo Shakarian,et al.  Exploring Malicious Hacker Forums , 2016, Cyber Deception.

[10]  Christopher J. Novak,et al.  2009 Data Breach Investigations Report , 2009 .

[11]  Ahmad Diab,et al.  Darknet and deepnet mining for proactive cybersecurity threat intelligence , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[12]  Fabio Massacci,et al.  Comparing Vulnerability Severity and Exploits Using Case-Control Studies , 2013, TSEC.

[13]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[14]  Ahmad Diab,et al.  Product offerings in malicious hacker markets , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[15]  Fabio Massacci,et al.  Quantitative Assessment of Risk Reduction with Cybercrime Black Market Monitoring , 2013, 2013 IEEE Security and Privacy Workshops.