Big Data for Cybersecurity: Vulnerability Disclosure Trends and Dependencies

Complex Big Data systems in modern organisations are progressively becoming attack targets by existing and emerging threat agents. Elaborate and specialised attacks will increasingly be crafted to exploit vulnerabilities and weaknesses. With the ever-increasing trend of cybercrime and incidents due to these vulnerabilities, effective vulnerability management is imperative for modern organisations regardless of their size. However, organisations struggle to manage the sheer volume of vulnerabilities discovered on their networks. Moreover, vulnerability management tends to be more reactive in practice. Rigorous statistical models, simulating anticipated volume and dependence of vulnerability disclosures, will undoubtedly provide important insights to organisations and help them become more proactive in the management of cyber risks. By leveraging the rich yet complex historical vulnerability data, our proposed novel and rigorous framework has enabled this new capability. By utilising this sound framework, we initiated an important study on not only handling persistent volatilities in the data but also further unveiling multivariate dependence structure amongst different vulnerability risks. In sharp contrast to the existing studies on univariate time series, we consider the more general multivariate case striving to capture their intriguing relationships. Through our extensive empirical studies using the real world vulnerability data, we have shown that a composite model can effectively capture and preserve long-term dependency between different vulnerability and exploit disclosures. In addition, the paper paves the way for further study on the stochastic perspective of vulnerability proliferation towards building more accurate measures for better cyber risk management as a whole.

[1]  Yi-Ting Chen,et al.  On the Robustness of Ljung-Box and McLeod-Li Q Tests: A Simulation Study , 2002 .

[2]  M. Smith,et al.  Copula Modelling of Dependence in Multivariate Time Series , 2013 .

[3]  Jun Yan,et al.  Enjoy the Joy of Copulas: With a Package copula , 2007 .

[4]  Om Patri,et al.  Discovering Malware with Time Series Shapelets , 2017, HICSS.

[5]  Lionel C. Briand,et al.  Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning , 2015, IEEE Transactions on Dependable and Secure Computing.

[6]  R. Ibragimov,et al.  Copula Estimation , 2009 .

[7]  Kim-Kwang Raymond Choo,et al.  Context-oriented web application protection model , 2016, Appl. Math. Comput..

[8]  Rahul Telang,et al.  Market for Software Vulnerabilities? Think Again , 2005, Manag. Sci..

[9]  Mamoun Alazab,et al.  Exploiting Vulnerability Disclosures: Statistical Framework and Case Study , 2016, 2016 Cybersecurity and Cyberforensics Conference (CCC).

[10]  A. Arora,et al.  Impact of Vulnerability Disclosure and Patch Availability - An Empirical Analysis , 2004 .

[11]  Nadjib Badache,et al.  Fast authentication in wireless sensor networks , 2016, Future Gener. Comput. Syst..

[12]  Mehran Bozorgi,et al.  Beyond heuristics: learning to classify vulnerabilities and predict exploits , 2010, KDD.

[13]  Mamoun Alazab,et al.  Spam and Criminal Activity , 2015 .

[14]  P. Phillips,et al.  Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? , 1992 .

[15]  Tudor Dumitras,et al.  Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits , 2015, USENIX Security Symposium.

[16]  Prasenjit Mitra,et al.  AlgorithmSeer: A System for Extracting and Searching for Algorithms in Scholarly Big Data , 2016, IEEE Transactions on Big Data.

[17]  Stefan Trück,et al.  Modeling Spot Price Dependence in Australian Electricity Markets with Applications to Risk Management , 2015, Comput. Oper. Res..

[18]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[19]  Hing Kai Chan,et al.  Recent Development in Big Data Analytics for Business Operations and Risk Management , 2017, IEEE Transactions on Cybernetics.

[20]  Mathias Ekstedt,et al.  Time between vulnerability disclosures: A measure of software product vulnerability , 2016, Comput. Secur..

[21]  Kim-Kwang Raymond Choo,et al.  Intent-Based Extensible Real-Time PHP Supervision Framework , 2016, IEEE Transactions on Information Forensics and Security.

[22]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[23]  anonymous,et al.  Introduction to the rugarch package. (Version 1.4-3) , 2013 .

[24]  Kim-Kwang Raymond Choo,et al.  Impacts of increasing volume of digital forensic data: A survey and future research challenges , 2014, Digit. Investig..

[25]  Kim-Kwang Raymond Choo,et al.  Web application protection techniques: A taxonomy , 2016, J. Netw. Comput. Appl..

[26]  Anil K. Bera,et al.  ARCH Models: Properties, Estimation and Testing , 1993 .

[27]  Mahdi Aiash,et al.  Machine Learning Based Botnet Identification Traffic , 2016, 2016 IEEE Trustcom/BigDataSE/ISPA.

[28]  Kim-Kwang Raymond Choo,et al.  Pervasive social networking forensics: Intelligence and evidence from mobile device extracts , 2017, J. Netw. Comput. Appl..

[29]  Viktor K. Prasanna,et al.  Extracting discriminative shapelets from heterogeneous sensor data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[30]  Feng Xia,et al.  Recommendation : Exploiting Common Author Relations and Historical Preferences , 2016 .

[31]  J. C. Rodríguez,et al.  Measuring financial contagion:a copula approach , 2007 .

[32]  Layne T. Watson,et al.  Security Optimization of Dynamic Networks with Probabilistic Graph Modeling and Linear Programming , 2016, IEEE Transactions on Dependable and Secure Computing.

[33]  Kim-Kwang Raymond Choo,et al.  Big forensic data reduction: digital forensic images and electronic evidence , 2016, Cluster Computing.

[34]  Vern Paxson,et al.  Towards Situational Awareness of Large-Scale Botnet Probing Events , 2011, IEEE Transactions on Information Forensics and Security.

[35]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[36]  Shirley M. Radack,et al.  National Vulnerability Database: Helping Information Technology System Users and Developers Find Current Information about Cyber Security Vulnerabilities | NIST , 2005 .

[37]  Yaman Roumani,et al.  Time series modeling of vulnerabilities , 2015, Comput. Secur..

[38]  Muhammad Zubair Shafiq,et al.  A large scale exploratory analysis of software vulnerability life cycles , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[39]  Vince D. Calhoun,et al.  Shapelet Ensemble for Multi-dimensional Time Series , 2015, SDM.

[40]  Shouhuai Xu,et al.  Characterizing Honeypot-Captured Cyber Attacks: Statistical Framework and Case Study , 2013, IEEE Transactions on Information Forensics and Security.

[41]  Nitesh V. Chawla,et al.  Can Scientific Impact Be Predicted? , 2016, IEEE Transactions on Big Data.

[42]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[43]  C. Sempi,et al.  Copula Theory: An Introduction , 2010 .

[44]  Tyler Moore,et al.  Information security: where computer science, economics and psychology meet , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[45]  Alessandro Orso,et al.  A Classification of SQL Injection Attacks and Countermeasures , 2006, ISSSE.

[46]  Shouhuai Xu,et al.  Predicting Cyber Attack Rates With Extreme Values , 2015, IEEE Transactions on Information Forensics and Security.

[47]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[48]  L. Myers,et al.  Spearman Correlation Coefficients, Differences between , 2004 .

[49]  Kim-Kwang Raymond Choo,et al.  Big forensic data management in heterogeneous distributed systems: quick analysis of multimedia forensic data , 2017, Softw. Pract. Exp..

[50]  May R. Chaffin,et al.  Empirical Estimates and Observations of 0Day Vulnerabilities , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[51]  Kim-Kwang Raymond Choo,et al.  Digital forensic intelligence: Data subsets and Open Source Intelligence (DFINT+OSINT): A timely and cohesive mix , 2018, Future Gener. Comput. Syst..

[52]  Md. Rafiqul Islam,et al.  Hybrids of support vector machine wrapper and filter based framework for malware detection , 2016, Future Gener. Comput. Syst..

[53]  Calton Pu,et al.  Buffer overflows: attacks and defenses for the vulnerability of the decade , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[54]  Mahdi Aiash,et al.  On preserving privacy in cloud computing using ToR , 2016 .

[55]  Lawrence A. Gordon,et al.  A framework for using insurance for cyber-risk management , 2003, Commun. ACM.

[56]  Wen Wang,et al.  Testing and modelling autoregressive conditional heteroskedasticity of streamflow processes , 2005 .

[57]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[58]  Ville Leppänen,et al.  The sigmoidal growth of operating system security vulnerabilities: An empirical revisit , 2015, Comput. Secur..

[59]  Zhendong Su,et al.  Static detection of cross-site scripting vulnerabilities , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[60]  Mamoun Alazab,et al.  Profiling and classifying the behavior of malicious codes , 2015, J. Syst. Softw..