Comparing negative binomial and recursive partitioning models for fault prediction

Two different software fault prediction models have been used to predict the N% of the files of a large software system that are likely to contain the largest numbers of faults. We used the same predictor variables in a negative binomial regression model and a recursive partitioning model, and compared their effectiveness on three large industrial software systems. The negative binomial model identified files that contain 76 to 93 percent of the faults, and recursive partitioning identified files that contain 68 to 85 percent.

[1]  Tim Menzies,et al.  Mining Repositories to Assist in Project Planning and Resource Allocation , 2004, MSR.

[2]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[3]  Witold Pedrycz,et al.  Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics , 2003, J. Syst. Softw..

[4]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[5]  Maurizio Pighin,et al.  An empirical analysis of fault persistence through software releases , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[6]  Elaine J. Weyuker,et al.  Automating algorithms for the identification of fault-prone files , 2007, ISSTA '07.

[7]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[8]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[9]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.

[10]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[11]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[12]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[13]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[14]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[15]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[16]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[17]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[18]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[19]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[20]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[23]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[24]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .