Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system’s files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.

[1]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[2]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[3]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[4]  Niclas Ohlsson,et al.  Predicting Fault-Prone Software Modules in Telephone Switches , 1996, IEEE Trans. Software Eng..

[5]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[6]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[7]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[8]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[9]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[10]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[11]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[12]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[13]  Elaine J. Weyuker,et al.  Looking for bugs in all the right places , 2006, ISSTA '06.

[14]  Elaine J. Weyuker,et al.  The distribution of faults in a large industrial software system , 2002, ISSTA '02.

[15]  Ian Witten,et al.  Data Mining , 2000 .

[16]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[17]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[18]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[19]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[20]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, ESEM 2007.

[21]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[22]  Taghi M. Khoshgoftaar,et al.  Early Quality Prediction: A Case Study in Telecommunications , 1996, IEEE Softw..

[23]  PedryczWitold,et al.  Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics , 2003 .

[24]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[25]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[26]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[27]  Witold Pedrycz,et al.  Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics , 2003, J. Syst. Softw..

[28]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[29]  Maurizio Pighin,et al.  An empirical analysis of fault persistence through software releases , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[30]  Elaine J. Weyuker,et al.  Automating algorithms for the identification of fault-prone files , 2007, ISSTA '07.

[31]  Edward N. Adams,et al.  Optimizing Preventive Service of Software Products , 1984, IBM J. Res. Dev..

[32]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.