The Optimal Class Size for Object-Oriented Software

A growing body of literature suggests that there is an optimal size for software components. This means that components that are too small or too big will have a higher defect content (i.e., there is a U-shaped curve relating defect content to size). The U-shaped curve has become known as the “Goldilocks Conjecture”. Recently, a cognitive theory has been proposed to explain this phenomenon, and it has been expanded to characterize object-oriented software. This conjecture has wide implications for software engineering practice. It suggests (1) that designers should deliberately strive to design classes that are of the optimal size, (2) that program decomposition is harmful, and (3) that there exists a maximum (threshold) class size that should not be exceeded to ensure fewer faults in the software. The purpose of the current paper is to evaluate this conjecture for object-oriented systems. We first demonstrate that the claims of an optimal component/class size (1 above) and of smaller components/classes having a greater defect content (2 above) are due to a mathematical artifact in the analyses performed previously. We then empirically test the threshold effect claims of this conjecture (3 above). To our knowledge, the empirical test of size threshold effects for object-oriented systems has not been performed thus far. We perform an initial study with an industrial C++ system, and replicated it twice on another C++ system and on a commercial Java application. Our results provide unambiguous evidence that there is no threshold effect of class size. We obtained the same result for three systems using 4 different size measures. These findings suggest that there is a simple continuous relationship between class size and faults, and that optimal class size, smaller classes are better, and threshold effects conjectures have no sound theoretical nor empirical basis.

[1]  Richard C. Atkinson,et al.  Introduction to psychology, 9th ed. , 1987 .

[2]  C SchmidtDouglas Using design patterns to develop reusable object-oriented communication software , 1995 .

[3]  Jarrett Rosenberg,et al.  Some misconceptions about lines of code , 1997, Proceedings Fourth International Software Metrics Symposium.

[4]  Douglas C. Schmidt,et al.  Using design patterns to develop reusable object-oriented communication software , 1995, CACM.

[5]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[6]  Les Hatton,et al.  Does OO Sync with How We Think? , 1998, IEEE Softw..

[7]  D Aniel E. O'l,et al.  The relationship between errors and size in knowledge-based systems , 1996 .

[8]  Victor R. Basili,et al.  Analyzing Error-Prone System Structure , 1991, IEEE Trans. Software Eng..

[9]  Clive Osmond,et al.  Modern Statistical Methods in Chronic Disease Epidemiology. , 1988 .

[10]  R. L. Mason,et al.  Outlier-Induced Collinearities , 1985 .

[11]  Shyam R Chidamber A metrics suite for object oriented software design , 1994 .

[12]  R. Schaefer Alternative estimators in logistic regression when the data are collinear , 1986 .

[13]  David A. Belsley A Guide to using the collinearity diagnostics , 1991, Computer Science in Economics and Management.

[14]  J. LeSage,et al.  The impact of collinearity involving the intercept term on the numerical accuracy of regression , 1988 .

[15]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[16]  Leslie Hatton,et al.  Software failures-follies and fallacies , 1997 .

[17]  John B. Bowen,et al.  Module size: A standard or heuristic? , 1984, J. Syst. Softw..

[18]  Alan Smith,et al.  Metrics collection in code and unit test as part of continuous quality improvement , 1993, Softw. Test. Verification Reliab..

[19]  Douglas C. Schmidt,et al.  Experience Using Design Patterns to Evolve Communication Software Across Diverse OS Platforms , 1995, ECOOP.

[20]  Lowell Jay Arthur,et al.  Rapid evolutionary development - requirements, prototyping and software creation , 1992, Wiley series in software engineering practice.

[21]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[22]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[23]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[24]  Daniel J. Paulish,et al.  An empirical investigation of software fault distribution , 1993, [1993] Proceedings First International Software Metrics Symposium.

[25]  Les Hatton,et al.  Is modularization always a good idea? , 1996, Inf. Softw. Technol..

[26]  F. Chayes Ratio Correlation: A Manual for Students of Petrology and Geochemistry , 1971 .

[27]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[28]  Lionel C. Briand,et al.  A Comprehensive Investigation of Quality Factors in Object-Oriented Designs: an Industrial Case Study , 1998 .

[29]  Tsutomu Ishida,et al.  Metrics and Models in Software Quality Engineering , 1995 .

[30]  Y. Wax,et al.  Collinearity diagnosis for a relative risk regression analysis: an application to assessment of diet-cancer relationship in epidemiological studies. , 1992, Statistics in medicine.

[31]  L. Pettit,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1992 .

[32]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[33]  Norman E. Fenton,et al.  Software metrics: successes, failures and new directions , 1999, J. Syst. Softw..

[34]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[35]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[36]  Stephen H. Kan,et al.  Metrics and Models in Software Quality Engineering , 1994, SOEN.

[37]  Muneo Takahashi,et al.  An empirical study of a model for program error prediction , 1985, ICSE '85.

[38]  Paul H. Lewis,et al.  An evaluation of code metrics for object-oriented programs , 1996, Inf. Softw. Technol..

[39]  N. Breslow,et al.  Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. , 1987, IARC scientific publications.

[40]  Norman Wilde,et al.  Maintaining object-oriented software , 1993, IEEE Software.

[41]  Mark Lorenz Object-Oriented Software Metrics , 1994 .

[42]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[43]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[44]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[45]  Tze-Jie Yu,et al.  Identifying Error-Prone Software—An Empirical Study , 1985, IEEE Transactions on Software Engineering.

[46]  K Ulm,et al.  A statistical method for assessing a threshold in epidemiological studies. , 1991, Statistics in medicine.

[47]  Robert L. Glass,et al.  Measuring software design quality , 1990 .

[48]  Carol Withrow,et al.  Prediction and control of ADA software defects , 1990, J. Syst. Softw..

[49]  Leslie Hatton,et al.  Unexpected (and sometimes unpleasant) lessons from data in real software systems , 1997 .

[50]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[51]  Scott N. Woodfield,et al.  A study of several metrics for programming effort , 1981, J. Syst. Softw..

[52]  K. Vairavan,et al.  An Experimental Investigation of Software Metrics and Their Relationship to Software Development Effort , 1989, IEEE Trans. Software Eng..

[53]  N. Breslow,et al.  Statistical methods in cancer research. Vol. 1. The analysis of case-control studies. , 1981 .

[54]  Carol Withrow,et al.  Error density and size in Ada software , 1990, IEEE Software.

[55]  N. Breslow,et al.  The analysis of case-control studies , 1980 .

[56]  John E. Gaffney,et al.  Estimating the Number of Faults in Code , 1984, IEEE Transactions on Software Engineering.

[57]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[58]  Stephen R. Schach,et al.  Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures , 1998, Proceedings of the 20th International Conference on Software Engineering.