A study of applying the bounded Generalized Pareto distribution to the analysis of software fault distribution

Software is currently a key part of many safety-critical applications. But the main problem facing the computer industry is how to develop a software with (ultra) high reliability on time, and assure the quality of software. In the past, some researchers reported that the Pareto distribution (PD) and the Weibull distribution (WD) models can be used for software reliability estimation and fault distribution modeling. In this paper we propose a modified PD model to predict and assess the software fault distribution. That is, we suggest using a special form of the Generalized Pareto distribution (GPD) model, named the bounded Generalized Pareto distribution (BGPD) model. We will show that the BGPD model eliminates several modeling issues that arise in the PD model, and perform detailed comparisons based on real software fault data. Experimental result shows that the proposed BGPD model presents very high fitness to the actual fault data. In the end, we conclude that the distribution of faults in a large software system can be well described by the Pareto principle.*

[1]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[2]  Gerald Keller,et al.  Statistics for Management and Economics , 1990 .

[3]  C.-T. Lin,et al.  Software Reliability Analysis by Considering Fault Dependency and Debugging Time Lag , 2006, IEEE Transactions on Reliability.

[4]  Chin-Yu Huang,et al.  Analysis of Software Reliability Modeling Considering Testing Compression Factor and Failure-to-Fault Relationship , 2010, IEEE Transactions on Computers.

[5]  W. W. Muir,et al.  Data, models, and statistical analysis , 1983 .

[6]  J. Hosking,et al.  Parameter and quantile estimation for the generalized pareto distribution , 1987 .

[7]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .

[8]  Hoang Pham Software Reliability , 1999 .

[9]  Q. P. Hu,et al.  Modeling and Analysis of Software Fault Detection and Correction Process by Considering Time Dependency , 2007, IEEE Transactions on Reliability.

[10]  Tekin Öztekin,et al.  Comparison of Parameter Estimation Methods for the Three-Parameter Generalized Pareto Distribution , 2005 .

[11]  C. Ravindranath Pandian Software Metrics: A Guide to Planning, Analysis, and Application , 2003 .

[12]  Min Xie,et al.  Software Reliability Modelling , 1991, Series on Quality, Reliability and Engineering Statistics.

[13]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[14]  Michael R. Lyu,et al.  A Unified Scheme of Some Nonhomogenous Poisson Process Models for Software Reliability Estimation , 2003, IEEE Trans. Software Eng..

[15]  Joseph Moses Juran,et al.  Quality-control handbook , 1951 .

[16]  Hongyu Zhang On the Distribution of Software Faults , 2008, IEEE Transactions on Software Engineering.

[17]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.