Modeling Skewness in Vulnerability Discovery

A vulnerability discovery model attempts to model the rate at which the vulnerabilities are discovered in a software product. Recent studies have shown that the S-shaped Alhazmi–Malaiya Logistic (AML) vulnerability discovery model often fits better than other models and demonstrates superior prediction capabilities for several major software systems. However, the AML model is based on the logistic distribution, which assumes a symmetrical discovery process with a peak in the center. Hence, it can be expected that when the discovery process does not follow a symmetrical pattern, an asymmetrical distribution based discovery model might perform better. Here, the relationship between performance of S-shaped vulnerability discovery models and the skewness in target vulnerability datasets is examined. To study the possible dependence on the skew, alternative S-shaped models based on the Weibull, Beta, Gamma and Normal distributions are introduced and evaluated. The models are fitted to data from eight major software systems. The applicability of the models is examined using two separate approaches: goodness of fit test to see how well the models track the data, and prediction capability using average error and average bias measures. It is observed that an excellent goodness of fit does not necessarily result in a superior prediction capability. The results show that when the prediction capability is considered, all the right skewed datasets are represented better with the Gamma distribution-based model. The symmetrical models tend to predict better for left skewed datasets; the AML model is found to be the best among them. Copyright © 2013 John Wiley & Sons, Ltd.

[1]  Jin Yoo Kim Vulnerability discovery in multiple version software systems : open source and commercial software systems , 2007 .

[2]  Ajantha Herath,et al.  Intrusion detection using the chi-square goodness-of-fit test for information assurance, network, forensics and software security , 2007 .

[3]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[4]  Mladen A. Vouk,et al.  On Reliability Analysis of Open Source Software - FEDORA , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[5]  Ross J. Anderson,et al.  Security in open versus closed systems - the dance of Boltzmann , 2002 .

[6]  Ron S. Kenett,et al.  Optimal scaling for risk assessment: merging of operational and financial data , 2010, Qual. Reliab. Eng. Int..

[7]  Y.K. Malaiya,et al.  Prediction capabilities of vulnerability discovery models , 2006, RAMS '06. Annual Reliability and Maintainability Symposium, 2006..

[8]  Mary Shaw,et al.  Empirical evaluation of defect projection models for widely-deployed production software systems , 2004, SIGSOFT '04/FSE-12.

[9]  Charles P. Pfleeger,et al.  Security in computing , 1988 .

[10]  Indrakshi Ray,et al.  Vulnerability Discovery in Multi-Version Software Systems , 2007, 10th IEEE High Assurance Systems Engineering Symposium (HASE'07).

[11]  Kinji Mori,et al.  Multi-layered Data Consistency Technology, An Enhanced Autonomous Decentralized Data Consistency Technology for IC Card Ticket System , 2007 .

[12]  Omar H. Alhazmi,et al.  Quantitative vulnerability assessment of systems software , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[13]  Eric Rescorla Security Holes . . . Who Cares? , 2003, USENIX Security Symposium.

[14]  Tadashi Dohi,et al.  Optimal Security Patch Release Timing under Non-homogeneous Vulnerability-Discovery Processes , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[15]  Samir K. Barua,et al.  Investigation of decision criteria for investment in risky assets , 1987 .

[16]  Yashwant K. Malaiya,et al.  Predictability of software-reliability models , 1992 .

[17]  C. Borror An Introduction to Statistical Methods and Data Analysis, 5th Ed. , 2002 .

[18]  Ying Zhou,et al.  Open source software reliability model , 2005, ACM SIGSOFT Softw. Eng. Notes.

[19]  Chen Kai Multi-Cycle Vulnerability Discovery Model for Prediction , 2010 .

[20]  Stuart E. Schechter,et al.  Milk or Wine: Does Software Security Improve with Age? , 2006, USENIX Security Symposium.

[21]  Yashwant K. Malaiya,et al.  Application of Vulnerability Discovery Models to Major Operating Systems , 2008, IEEE Transactions on Reliability.

[22]  John D. Musa,et al.  Software Reliability Engineering: More Reliable Software Faster and Cheaper , 2004 .

[23]  Siv Hilde Houmb,et al.  Quantifying security risk level from CVSS estimates of frequency and impact , 2010, J. Syst. Softw..

[24]  Francisco Taboada,et al.  Comparative study of four sigmoid models of pressure-volume curve in acute lung injury , 2007, Biomedical engineering online.

[25]  Soumyo Moitra,et al.  Skewness and the Beta Distribution , 1990 .

[26]  E O Voit,et al.  Random Number Generation from Right‐Skewed, Symmetric, and Left‐Skewed Distributions , 2000, Risk analysis : an official publication of the Society for Risk Analysis.

[27]  M. G. Bulmer,et al.  Principles of Statistics. , 1969 .

[28]  N. Ye,et al.  Robustness of Chi‐square and Canberra distance metrics for computer intrusion detection , 2002 .

[29]  Catherine Stringfellow,et al.  Quantitative Analysis of Development Defects to Guide Testing: A Case Study , 2001, Software Quality Journal.

[30]  Sarah Brocklehurst,et al.  Recalibrating Software Reliability Models , 1990, IEEE Trans. Software Eng..

[31]  Laurie A. Williams,et al.  Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[32]  Yashwant K. Malaiya,et al.  Measuring and Enhancing Prediction Capabilities of Vulnerability Discovery Models for Apache and IIS HTTP Servers , 2006, 2006 17th International Symposium on Software Reliability Engineering.