How scale affects structure in Java programs

Many internal software metrics and external quality attributes of Java programs correlate strongly with program size. This knowledge has been used pervasively in quantitative studies of software through practices such as normalization on size metrics. This paper reports size-related super- and sublinear effects that have not been known before. Findings obtained on a very large collection of Java programs -- 30,911 projects hosted at Google Code as of Summer 2011 -- unveils how certain characteristics of programs vary disproportionately with program size, sometimes even non-monotonically. Many of the specific parameters of nonlinear relations are reported. This result gives further insights for the differences of ``programming in the small'' vs. ``programming in the large.'' The reported findings carry important consequences for OO software metrics, and software research in general: metrics that have been known to correlate with size can now be properly normalized so that all the information that is left in them is size-independent.

[1]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[2]  Joseph Gil,et al.  The Use of Overloading in Java Programs , 2010, ECOOP.

[3]  John D. McGregor,et al.  Introduction—object-oriented design , 1990, CACM.

[4]  Carlo Ghezzi,et al.  An empirical investigation into a large-scale Java open source code repository , 2010, ESEM '10.

[5]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[6]  Ewan D. Tempero,et al.  How Do Java Programs Use Inheritance? An Empirical Study of Inheritance in Java Software , 2008, ECOOP.

[7]  Kim Mens,et al.  Experimental Software and Toolkits (EST 4): A special issue of the Workshop on Academic Software Development Tools and Techniques (WASDeTT-3 2010) , 2014 .

[8]  William M. Evanco,et al.  Comments on "The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics" , 2003, IEEE Trans. Software Eng..

[9]  James Noble,et al.  Scale-free geometry in OO programs , 2005, CACM.

[10]  BinkleyDavid,et al.  An empirical study of slice-based cohesion and coupling metrics , 2007 .

[11]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[12]  Diomidis Spinellis,et al.  Power laws in software , 2008, TSEM.

[13]  El EmamKalhed,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001 .

[14]  Michael Stepp,et al.  An empirical study of Java bytecode programs , 2007, Softw. Pract. Exp..

[15]  Christopher R. Myers,et al.  Software systems as complex networks: structure, function, and evolvability of software collaboration graphs , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Cristina V. Lopes,et al.  Is Popularity a Measure of Quality? An Analysis of Maven Components , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[17]  Sushil Krishna Bajracharya,et al.  Sourcerer: An infrastructure for large-scale collection and analysis of open-source code , 2014, Sci. Comput. Program..

[18]  Bruno Bassetti,et al.  Evidence for soft bounds in Ubuntu package sizes and mammalian body masses , 2013, Proceedings of the National Academy of Sciences.

[19]  Sushil Krishna Bajracharya,et al.  SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[20]  Simon A. Levin,et al.  Evolution of a modular software network , 2011, Proceedings of the National Academy of Sciences.

[21]  Romain Robbes,et al.  How developers use the dynamic features of programming languages: the case of smalltalk , 2011, MSR '11.

[22]  Xiaolong Zheng,et al.  Analyzing open-source software systems as complex networks , 2008 .

[23]  Alexander Serebrenik,et al.  Empirical Analysis of the Relationship between CC and SLOC in a Large Corpus of Java Methods , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[24]  Ewan D. Tempero,et al.  Multiple dispatch in practice , 2008, OOPSLA.

[25]  John D. McGregor,et al.  Object-Oriented Design (Introduction to the Special Issue) , 1990, Commun. ACM.

[26]  Michael Stepp,et al.  The Yoix® scripting language: a different way of writing Java™ applications , 2007 .

[27]  Chris F. Kemerer,et al.  Towards a metrics suite for object oriented design , 2017, OOPSLA '91.

[28]  Booncharoen Sirinaovakul,et al.  Introduction to the Special Issue , 2002, Comput. Intell..

[29]  Hans H. Kron,et al.  Programming-in-the-Large Versus Programming-in-the-Small , 1975, IEEE Transactions on Software Engineering.

[30]  Ricard V. Solé,et al.  Logarithmic growth dynamics in software networks , 2005, ArXiv.

[31]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[32]  R. Ferrer i Cancho,et al.  Scale-free networks from optimal design , 2002, cond-mat/0204344.