Time series forecast modeling of vulnerabilities in the android operating system using ARIMA and deep learning methods

Abstract Security vulnerability prediction models allow estimation of the number of potential vulnerabilities and evaluation of the risks caused by these vulnerabilities. In particular, for modeling the vulnerabilities that may occur in software versions over time, it is appropriate to take the necessary countermeasures. These models are crucial in areas such as determining the number of resources required to cope with security vulnerabilities. These reported vulnerabilities, we anticipate the actions of OS companies to make strategic and operational decisions such as secure deployment. The operating system includes backup provisioning, disaster recovery. Although many vulnerability predictions models have been constructed, most of these are for operating systems and internet browsers, and non-exist for the Android mobile operating system, which has the highest number of users. In contrast to other studies, the present study investigated Android vulnerabilities that directly depend on time. Time series, multilayer perceptron (MLP), convolutional neural network (CNN), long short term memory (LSTM), Convolutional LSTM (ConvLSTM) and CNN-LSTM based models were generated, and the best model, providing the lowest error rates for the prediction of future security vulnerabilities, was selected. Data for the creation of the models were obtained by filtering security vulnerabilities published in the National Vulnerability Database (NVD) using the keyword Android. It was observed that the LSTM model has an error rate of 26.830 and the ARIMA model has an error rate of 18.449. Finally, it has been determined that LSTM based algorithms reach error rates that can compete with classical time series models despite limited data.

[1]  Wouter Joosen,et al.  Predicting Vulnerable Software Components via Text Mining , 2014, IEEE Transactions on Software Engineering.

[2]  Andy Ozment,et al.  Improving vulnerability discovery models , 2007, QoP '07.

[3]  Viet Hung Nguyen,et al.  Predicting vulnerable software components with dependency graphs , 2010, MetriSec '10.

[4]  Kiran Kumar Paidipati,et al.  Forecasting of Rice Cultivation in India-A Comparative Analysis with ARIMA and LSTM-NN Models , 2020, EAI Endorsed Trans. Scalable Information Systems.

[5]  William W. S. Wei,et al.  Time series analysis - univariate and multivariate methods , 1989 .

[6]  Donghai Tian,et al.  E-WBM: An Effort-Based Vulnerability Discovery Model , 2019, IEEE Access.

[7]  Omar H. Alhazmi,et al.  Quantitative vulnerability assessment of systems software , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[8]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[9]  Indrajit Ray,et al.  Measuring, analyzing and predicting security vulnerabilities in software systems , 2007, Comput. Secur..

[10]  Yashwant K. Malaiya,et al.  Vulnerability Discovery Modeling Using Weibull Distribution , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Lionel C. Briand,et al.  Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning , 2015, IEEE Transactions on Dependable and Secure Computing.

[13]  Adarsh Anand,et al.  Vulnerability Discovery Modeling and Weighted Criteria Based Ranking , 2016 .

[14]  Yaman Roumani,et al.  Time series modeling of vulnerabilities , 2015, Comput. Secur..

[15]  Vikram Bali,et al.  A Novel Approach for Wind Speed Forecasting Using LSTM-ARIMA Deep Learning Models , 2020, Int. J. Agric. Environ. Inf. Syst..

[16]  Yashwant K. Malaiya,et al.  Application of Vulnerability Discovery Models to Major Operating Systems , 2008, IEEE Transactions on Reliability.

[17]  Chris P. Tsokos,et al.  Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach , 2017 .

[18]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[19]  Mamoun Alazab,et al.  Disclosure of Cyber Security Vulnerabilities: Time Series Modelling , 2018 .

[20]  Marc Donner There Ain't No Inside, There Ain't No Outside , 2005, IEEE Secur. Priv..

[21]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[22]  Indrakshi Ray,et al.  Vulnerability Discovery in Multi-Version Software Systems , 2007 .

[23]  Eric Rescorla,et al.  Is finding security holes a good idea? , 2005, IEEE Security & Privacy.

[24]  Yashwant K. Malaiya,et al.  Modeling vulnerability discovery process in Apache and IIS HTTP servers , 2011, Comput. Secur..

[25]  Evangelos Spiliotis,et al.  Statistical and Machine Learning forecasting methods: Concerns and ways forward , 2018, PloS one.

[26]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Yashwant K. Malaiya,et al.  AN ANALYSIS OF THE VULNERABILITY DISCOVERY PROCESS IN WEB BROWSERS , 2006 .

[28]  R. K. Agrawal,et al.  Hybridization of Artificial Neural Network and Particle Swarm Optimization Methods for Time Series Forecasting , 2013, Int. J. Appl. Evol. Comput..

[29]  Chen Kai Multi-Cycle Vulnerability Discovery Model for Prediction , 2010 .

[30]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[31]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[32]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[33]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[34]  Amir F. Atiya,et al.  An Empirical Comparison of Machine Learning Models for Time Series Forecasting , 2010 .

[35]  Y.K. Malaiya,et al.  Prediction capabilities of vulnerability discovery models , 2006, RAMS '06. Annual Reliability and Maintainability Symposium, 2006..

[36]  Adnan Sözen,et al.  Comparative analysis and forecasting of COVID-19 cases in various European countries with ARIMA, NARNN and LSTM approaches , 2020, Chaos, Solitons & Fractals.

[37]  Riccardo Scandariato,et al.  Predicting vulnerable classes in an Android application , 2012, MetriSec '12.

[38]  Houshang Darabi,et al.  Insights Into LSTM Fully Convolutional Networks for Time Series Classification , 2019, IEEE Access.

[39]  Uday Kumar,et al.  Coverage-based vulnerability discovery modeling to optimize disclosure time using multiattribute approach , 2019, Qual. Reliab. Eng. Int..

[40]  D. Rubinfeld,et al.  Econometric models and economic forecasts , 2002 .

[41]  Mehdi R. Zargham,et al.  Vulnerability Scrying Method for Software Vulnerability Discovery Prediction Without a Vulnerability Database , 2013, IEEE Transactions on Reliability.

[42]  W. Fuller,et al.  LIKELIHOOD RATIO STATISTICS FOR AUTOREGRESSIVE TIME SERIES WITH A UNIT ROOT , 1981 .