An Empirical Study of Class Sizes for Large Java Systems

We perform an empirical study of class sizes (in terms of Lines of Code) on a number of large Java software systems, and discover an interesting pattern - that many classes have only small sizes whereas a few classes have large size. We call this phenomenon the small class phenomenon. Further analysis shows that the class sizes follow the lognormal distribution. Having understood the distribution of class sizes, we then derive a general size estimation model, which reveals the relationship between the size of a large Java system and the number of classes the system has. In this paper, we also show that the adoption of object- orientation is a possible cause of the small class phenomenon. We believe our study reveals the regularity that emerges from large-scale object-oriented software construction, and hope our research can contribute to a deep understanding of computer programming.

[1]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[2]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[3]  Eric R. Ziegel,et al.  Probability and Statistics for Engineering and the Sciences , 2004, Technometrics.

[4]  William J. Bolosky,et al.  A large-scale study of file-system contents , 1999, SIGMETRICS '99.

[5]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[6]  B H Groth THE "GOLDEN MEAN" IN THE INHERITANCE OF SIZE. , 1914, Science.

[7]  Chris F. Kemerer,et al.  Cyclomatic Complexity Density and Software Maintenance Productivity , 1991, IEEE Trans. Software Eng..

[8]  Norman Wilde,et al.  Maintaining object-oriented software , 1993, IEEE Software.

[9]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[10]  Hongfang Liu,et al.  An investigation of the effect of module size on defect prediction using static measures , 2005, PROMISE@ICSE.

[11]  Mahadev Satyanarayanan,et al.  A study of file sizes and functional lifetimes , 1981, SOSP.

[12]  Khaled El Emam,et al.  The Optimal Class Size for Object-Oriented Software , 2002, IEEE Trans. Software Eng..

[13]  Genny Tortora,et al.  Class point: an approach for the size estimation of object-oriented systems , 2005, IEEE Transactions on Software Engineering.

[14]  E. Crow,et al.  Lognormal Distributions: Theory and Applications , 1987 .

[15]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[16]  Watts S. Humphrey,et al.  A discipline for software engineering , 2012, Series in software engineering.

[17]  John B. Bowen,et al.  Module size: A standard or heuristic? , 1984, J. Syst. Softw..

[18]  H. F. Li,et al.  An Empirical Study of Software Metrics , 1987, IEEE Transactions on Software Engineering.

[19]  Brian Foote,et al.  Designing Reusable Classes , 2001 .

[20]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[21]  Robert L. Glass,et al.  Measuring software design quality , 1990 .

[22]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[23]  Michael B. O'Neal An empirical study of three common software complexity measures , 1993, SAC '93.

[24]  Ware Myers,et al.  Measures for Excellence: Reliable Software on Time, Within Budget , 1991 .

[25]  W. Stahel,et al.  Log-normal Distributions across the Sciences: Keys and Clues , 2001 .