Challenges in Forecasting Malicious Events from Incomplete Data

The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data – ranging from vulnerability disclosures to discussions on Twitter and the darkweb – with machine learning algorithms to learn indicators of impending cyber-attacks. However, successful cyber-attacks represent a tiny fraction of all attempted attacks: the vast majority are stopped, or filtered by the security appliances deployed at the target. As we show in this paper, the process of filtering reduces the predictability of cyber-attacks. The small number of attacks that do penetrate the target’s defenses follow a different generative process compared to the whole data which is much harder to learn for predictive models. This could be caused by the fact that the resulting time series also depends on the filtering process in addition to all the different factors that the original time series depended on. We empirically quantify the loss of predictability due to filtering using real-world data from two organizations. Our work identifies the limits to forecasting cyber-attacks from highly filtered data.

[1]  Shouhuai Xu,et al.  Modeling and Predicting Cyber Hacking Breaches , 2018, IEEE Transactions on Information Forensics and Security.

[2]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[3]  Shouhuai Xu,et al.  Predicting Cyber Attack Rates With Extreme Values , 2015, IEEE Transactions on Information Forensics and Security.

[4]  Shanchieh Jay Yang,et al.  Predicting cyber attacks with bayesian networks using unconventional signals , 2017, CISRC.

[5]  Sajjan G. Shiva,et al.  A MULTI -LAYER ARCHITECTURE FOR SPAM -DETECTION SYSTEM , 2014 .

[6]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[7]  Kristina Lerman,et al.  Predicting Cyber Events by Leveraging Hacker Sentiment , 2018, Inf..

[8]  Paulo Shakarian,et al.  DarkEmbed: Exploit Prediction With Neural Language Models , 2018, AAAI.

[9]  K. Lim,et al.  Are US stock index returns predictable? Evidence from automatic autocorrelation-based tests , 2013 .

[10]  Giovanni Petri,et al.  On the predictability of infectious disease outbreaks , 2017, Nature Communications.

[11]  B. Pompe,et al.  Permutation entropy: a natural complexity measure for time series. , 2002, Physical review letters.

[12]  Yunchuan Guo,et al.  Cyber Attacks Prediction Model Based on Bayesian Network , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[13]  Paulo Shakarian,et al.  DARKMENTION: A Deployed System to Predict Enterprise-Targeted External Cyberattacks , 2018, 2018 IEEE International Conference on Intelligence and Security Informatics (ISI).

[14]  Gautam Das,et al.  Malware in the Future? Forecasting Analyst Detection of Cyber Events , 2017, J. Cybersecur..

[15]  Kristina Lerman,et al.  Discovering Signals from Web Sources to Predict Cyber Attacks , 2018, ArXiv.

[16]  Kristina Lerman,et al.  Predictability limit of partially observed systems , 2020, Scientific reports.

[17]  Shanchieh Jay Yang,et al.  Time series forecasting of cyber attack intensity , 2017, CISRC.

[18]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Srinivas Katkoori,et al.  LSTM-Based Memory Profiling for Predicting Data Attacks in Distributed Big Data Systems , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[22]  Garrison W. Cottrell,et al.  A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction , 2017, IJCAI.

[23]  Fabio Massacci,et al.  Security Events and Vulnerability Data for Cybersecurity Risk Estimation , 2017, Risk analysis : an official publication of the Society for Risk Analysis.