Using Twitter to Predict When Vulnerabilities will be Exploited

When a new cyber-vulnerability is detected, a Common Vulnerability and Exposure (CVE) number is attached to it. Malicious "exploits'' may use these vulnerabilities to carry out attacks. Unlike works which study if a CVE will be used in an exploit, we study the problem of predicting when an exploit is first seen. This is an important question for system administrators as they need to devote scarce resources to take corrective action when a new vulnerability emerges. Moreover, past works assume that CVSS scores (released by NIST) are available for predictions, but we show on average that 49% of real world exploits occur before CVSS scores are published. This means that past works, which use CVSS scores, miss almost half of the exploits. In this paper, we propose a novel framework to predict when a vulnerability will be exploited via Twitter discussion, without using CVSS score information. We introduce the unique concept of a family of CVE-Author-Tweet (CAT) graphs and build a novel set of features based on such graphs. We define recurrence relations capturing "hotness" of tweets, "expertise" of Twitter users on CVEs, and "availability" of information about CVEs, and prove that we can solve these recurrences via a fix point algorithm. Our second innovation adopts Hawkes processes to estimate the number of tweets/retweets related to the CVEs. Using the above two sets of novel features, we propose two ensemble forecast models FEEU (for classification) and FRET (for regression) to predict when a CVE will be exploited. Compared with natural adaptations of past works (which predict if an exploit will be used), FEEU increases F1 score by 25.1%, while FRET decreases MAE by 37.2%.

[1]  Leman Akoglu,et al.  A Domain-Agnostic Approach to Spam-URL Detection via Redirects , 2017, PAKDD.

[2]  Alan Said,et al.  Predicting Cyber Vulnerability Exploits with Machine Learning , 2015, Scandinavian Conference on AI.

[3]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[4]  N. Johnson The MITRE corporation , 1961, ACM National Meeting.

[5]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Sushil Jajodia,et al.  VULCON: A System for Vulnerability Prioritization, Mitigation, and Management , 2018, ACM Trans. Priv. Secur..

[8]  Christos Faloutsos,et al.  Opinion Fraud Detection in Online Reviews by Network Effects , 2013, ICWSM.

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  A. Arenas,et al.  Mathematical Formulation of Multilayer Networks , 2013, 1307.4977.

[11]  Paulo Shakarian,et al.  Proactive identification of exploits in the wild through vulnerability mentions online , 2017, 2017 International Conference on Cyber Conflict (CyCon U.S.).

[12]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[13]  Christopher L. Smith,et al.  Predicting Exploitation of Disclosed Software Vulnerabilities Using Open-source Data , 2017, IWSPA@CODASPY.

[14]  Mason A. Porter,et al.  Multilayer networks , 2013, J. Complex Networks.

[15]  Scott Sanner,et al.  Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity , 2016, WWW.

[16]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[17]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[18]  V. S. Subrahmanian,et al.  Ensemble Models for Data-driven Prediction of Malware Infections , 2016, WSDM.

[19]  Paulo Shakarian,et al.  DarkEmbed: Exploit Prediction With Neural Language Models , 2018, AAAI.