An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules

The importance of the relationship between size and defect proneness of software modules is well recognized. Understanding the nature of that relationship can facilitate various development decisions related to prioritization of quality assurance activities. Overall, the previous research only drew a general conclusion that there was a monotonically increasing relationship between module size and defect proneness. In this study, we analyzed class-level size and defect data in order to increase our understanding of this crucial relationship. In order to obtain validated and more generalizable results, we studied four large-scale object-oriented products, Mozilla, Cn3d, JBoss, and Eclipse. Our results consistently revealed a significant effect of size on defect proneness; however, contrary to common intuition, the size-defect relationship took a logarithmic form, indicating that smaller classes were proportionally more problematic than larger classes. Therefore, practitioners should consider giving higher priority to smaller modules when planning focused quality assurance activities with limited resources. For example, in Mozilla and Eclipse, an inspection strategy investing 80% of available resources on 100-LOC classes and the rest on 1,000-LOC classes would be more than twice as cost effective as the opposite strategy. These results should be immediately useful to guide focused quality assurance activities in large-scale software projects.

[1]  Jeff Tian,et al.  Software quality engineering - testing, quality assurance, and quantifiable improvement , 2005 .

[2]  Norman F. Schneidewind,et al.  An Experiment in Software Error Data Collection and Analysis , 1979, IEEE Transactions on Software Engineering.

[3]  John E. Gaffney,et al.  Estimating the Number of Faults in Code , 1984, IEEE Transactions on Software Engineering.

[4]  Maurice H. Halstead,et al.  A Software Physics Analysis of Akiyama's Debugging Data , 1975 .

[5]  Leslie Hatton Does OO sync with the way we think , 1998 .

[6]  Victor R. Basili,et al.  Analyzing Error-Prone System Structure , 1991, IEEE Trans. Software Eng..

[7]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[8]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[9]  Jeff Tian,et al.  Measurement and defect modeling for a legacy software system , 1995, Ann. Softw. Eng..

[10]  Taghi M. Khoshgoftaar,et al.  Using neural networks to predict software faults during testing , 1996, IEEE Trans. Reliab..

[11]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[12]  Per Runeson,et al.  A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[13]  Hongfang Liu,et al.  Modeling the Effect of Size on Defect Proneness for Open-Source Software , 2007, 29th International Conference on Software Engineering (ICSE'07 Companion).

[14]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[15]  Akif Günes Koru,et al.  An empirical comparison and characterization of high defect and high complexity modules , 2003, J. Syst. Softw..

[16]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[17]  D.,et al.  Regression Models and Life-Tables , 2022 .

[18]  J Stare,et al.  Explained variation in survival analysis. , 1996, Statistics in medicine.

[19]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[20]  G. D. Frewin,et al.  M.H. Halstead's Software Science - a critical examination , 1982, ICSE '82.

[21]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[22]  F. Chayes Ratio Correlation: A Manual for Students of Petrology and Geochemistry , 1971 .

[23]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[24]  David W. Hosmer,et al.  Applied Survival Analysis: Regression Modeling of Time-to-Event Data , 2008 .

[25]  Tze-Jie Yu,et al.  Identifying Error-Prone Software—An Empirical Study , 1985, IEEE Transactions on Software Engineering.

[26]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[27]  Adam A. Porter,et al.  Empirically guided software development using metric-based classification trees , 1990, IEEE Software.

[28]  Khaled El Emam,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[29]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[30]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[31]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[32]  R. Prentice,et al.  Commentary on Andersen and Gill's "Cox's Regression Model for Counting Processes: A Large Sample Study" , 1982 .

[33]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[34]  Robert L. Glass,et al.  Measuring software design quality , 1990 .

[35]  Khaled El Emam The ROI from Software Quality , 2005 .

[36]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[37]  Jean-Claude Laprie,et al.  Software reliability and system reliability , 1996 .

[38]  M. Lipow,et al.  Number of Faults per Line of Code , 1982, IEEE Transactions on Software Engineering.

[39]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[40]  Akif Günes Koru,et al.  Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products , 2005, IEEE Transactions on Software Engineering.

[41]  Linda M. Ottenstein Quantitative Estimates of Debugging Requirements , 1979, IEEE Transactions on Software Engineering.

[42]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[43]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[44]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[45]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[46]  Akif Günes Koru,et al.  Defect handling in medium and large open source projects , 2004, IEEE Software.

[47]  Taghi M. Khoshgoftaar,et al.  Application of neural networks to software quality modeling of a very large telecommunications system , 1997, IEEE Trans. Neural Networks.

[48]  Khaled El Emam,et al.  The Optimal Class Size for Object-Oriented Software , 2002, IEEE Trans. Software Eng..

[49]  Jarrett Rosenberg,et al.  Some misconceptions about lines of code , 1997, Proceedings Fourth International Software Metrics Symposium.

[50]  Les Hatton,et al.  Does OO Sync with How We Think? , 1998, IEEE Softw..

[51]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[52]  Fumio Akiyama,et al.  An Example of Software System Debugging , 1971, IFIP Congress.

[53]  Eric Lease Morgan,et al.  Review of The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary by Eric S. Raymond, Sebastopol, Calif.: O'Reilly, 1999 , 2000 .

[54]  Audris Mockus,et al.  Using Version Control Data to Evaluate the Impact of Software Tools: A Case Study of the Version Editor , 2002, IEEE Trans. Software Eng..