Where the bugs are

The ability to predict which files in a large software system are most likely to contain the largest numbers of faults in the next release can be a very valuable asset. To accomplish this, a negative binomial regression model using information from previous releases has been developed and used to predict the numbers of faults for a large industrial inventory system. The files of each release were sorted in descending order based on the predicted number of faults and then the first 20% of the files were selected. This was done for each of fifteen consecutive releases, representing more than four years of field usage. The predictions were extremely accurate, correctly selecting files that contained between 71% and 92% of the faults, with the overall average being 83%. In addition, the same model was used on data for the same system's releases, but with all fault data prior to integration testing removed. The prediction was again very accurate, ranging from 71% to 93%, with the average being 84%. Predictions were made for a second system, and again the first 20% of files accounted for 83% of the identified faults. Finally, a highly simplified predictor was considered which correctly predicted 73% and 74% of the faults for the two systems.

[1]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[2]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[3]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[4]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[5]  Maurizio Pighin,et al.  An empirical analysis of fault persistence through software releases , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[6]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[7]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[8]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[9]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[10]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[11]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[12]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[13]  T. J. Ostrand,et al.  Using static analysis to determine where to focus dynamic testing effort , 2004 .

[14]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..