Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities

The number of software vulnerabilities discovered and publicly disclosed is increasing every year; however, only a small fraction of these vulnerabilities are exploited in real-world attacks. With limitations on time and skilled resources, organizations often look at ways to identify threatened vulnerabilities for patch prioritization. In this chapter, an exploit prediction model is presented, which predicts whether a vulnerability will likely be exploited. Our proposed model leverages data from a variety of online data sources (white hat community, vulnerability research community, and dark web/deep web (DW) websites) with vulnerability mentions. Compared to the standard scoring system (CVSS base score) and a benchmark model that leverages Twitter data in exploit prediction, our model outperforms the baseline models with an F1 measure of 0.40 on the minority class (266% improvement over CVSS base score) and also achieves high true positive rate and low false positive rate (90%, 13%, respectively), making it highly effective as an early predictor of exploits that could appear in the wild. A qualitative and a quantitative study are also conducted to investigate whether the likelihood of exploitation increases if a vulnerability is mentioned in each of the examined data sources. The proposed model is proven to be much more robust than adversarial examples—postings authored by adversaries in the attempt to induce the model to produce incorrect predictions. A discussion on the viability of the model is provided, showing cases where the classifier achieves high performance, and other cases where the classifier performs less efficiently.

[1]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[2]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Vern Paxson,et al.  The Matter of Heartbleed , 2014, Internet Measurement Conference.

[4]  Fabio Massacci,et al.  Comparing Vulnerability Severity and Exploits Using Case-Control Studies , 2013, TSEC.

[5]  Leyla Bilge,et al.  Before we knew it: an empirical study of zero-day attacks in the real world , 2012, CCS.

[6]  Peter L. Bartlett,et al.  Open problems in the security of learning , 2008, AISec '08.

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  T. Holt,et al.  Exploring stolen data markets online: products and market forces , 2010 .

[9]  Luca Allodi,et al.  Economic Factors of Vulnerability Trade and Exploitation , 2017, CCS.

[10]  Nicolas Christin,et al.  Automatically Detecting Vulnerable Websites Before They Turn Malicious , 2014, USENIX Security Symposium.

[11]  Blaine Nelson,et al.  Support Vector Machines Under Adversarial Label Noise , 2011, ACML.

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Timothy W. Finin,et al.  CyberTwitter: Using Twitter to generate alerts for cybersecurity threats and vulnerabilities , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[14]  Christopher L. Smith,et al.  Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data , 2017, IWSPA@CODASPY.

[15]  Fabio Massacci,et al.  Comparing Vulnerability Severity and Exploits Using Case-Control Studies , 2014, TSEC.

[16]  Ahmad Diab,et al.  Darknet and deepnet mining for proactive cybersecurity threat intelligence , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[17]  Ahmad Diab,et al.  Product offerings in malicious hacker markets , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[18]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Fabio Massacci,et al.  The Work-Averse Cyber Attacker Model: Theory and Evidence From Two Million Attack Signatures , 2017 .

[21]  Paulo Shakarian,et al.  Proactive identification of exploits in the wild through vulnerability mentions online , 2017, 2017 International Conference on Cyber Conflict (CyCon U.S.).

[22]  Ahmad Diab,et al.  Darkweb Cyber Threat Intelligence Mining , 2017 .

[23]  Doina Caragea,et al.  An Empirical Study on Using the National Vulnerability Database to Predict Software Vulnerabilities , 2011, DEXA.

[24]  Shlomo Shamai,et al.  Mutual information and minimum mean-square error in Gaussian channels , 2004, IEEE Transactions on Information Theory.

[25]  Paulo Shakarian,et al.  Exploring Malicious Hacker Forums , 2016, Cyber Deception.

[26]  Hsinchun Chen,et al.  AZSecure Hacker Assets Portal: Cyber threat intelligence and malware analysis , 2016, 2016 IEEE Conference on Intelligence and Security Informatics (ISI).

[27]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[28]  Fabio Massacci,et al.  A preliminary analysis of vulnerability scores for attacks in wild: the ekits and sym datasets , 2012, BADGERS@CCS.

[29]  Fabio Massacci,et al.  Quantitative Assessment of Risk Reduction with Cybercrime Black Market Monitoring , 2013, 2013 IEEE Security and Privacy Workshops.

[30]  Nick Feamster,et al.  PREDATOR: Proactive Recognition and Elimination of Domain Abuse at Time-Of-Registration , 2016, CCS.

[31]  Alan Said,et al.  Predicting Cyber Vulnerability Exploits with Machine Learning , 2015, Scandinavian Conference on AI.

[32]  Tudor Dumitras,et al.  Some Vulnerabilities Are Different Than Others - Studying Vulnerabilities and Attack Surfaces in the Wild , 2014, RAID.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Stefan Savage,et al.  An analysis of underground forums , 2011, IMC '11.

[35]  Parinaz Naghizadeh Ardabili,et al.  Cloudy with a Chance of Breach: Forecasting Cyber Security Incidents , 2015, USENIX Security Symposium.

[36]  Bernhard Plattner,et al.  Modelling the Security Ecosystem- The Dynamics of (In)Security , 2009, WEIS.

[37]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..