Towards a Theoretical Model for Software Growth

Software growth (and more broadly, software evolution) is usually considered in terms of size or complexity of source code. However in different studies, usually different metrics are used, which make it difficult to compare approaches and results. In addition, not all metrics are equally easy to calculate for a given source code, which leads to the question of which one is the easiest to calculate without losing too much information. To address both issues, in this paper present a comprehensive study, based on the analysis of about 700,000 C source code files, calculating several size and complexity metrics for all of them. For this sample, we have found double Pareto statistical distributions for all metrics considered, and a high correlation between any two of them. This would imply that any model addressing software growth should produce this Pareto distributions, and that analysis based on any of the considered metrics should show a similar pattern, provided the sample of files considered is large enough.

[1]  Jesús M. González-Barahona,et al.  Evolution and growth in large libre software projects , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[2]  Tsutomu Ishida,et al.  Metrics and Models in Software Quality Engineering , 1995 .

[3]  Stefan Koch,et al.  Evolution of Open Source Software Systems - A Large-Scale Investigation , 2005 .

[4]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[5]  Wladyslaw M. Turski Reference Model for Smooth Growth of Software Systems(003)5402022 , 1996, IEEE Transactions on Software Engineering.

[6]  Michael Mitzenmacher,et al.  Dynamic Models for File Sizes and Double Pareto Distributions , 2004, Internet Math..

[7]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[8]  Azer Bestavros,et al.  Changes in Web client access patterns: Characteristics and caching implications , 1999, World Wide Web.

[9]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[10]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[11]  Meir M. Lehman,et al.  An approach to modelling long-term growth trends in software systems , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[12]  Ioannis Stamelos,et al.  Dynamical Simulation Models of the Open Source Development Process , 2005 .

[13]  Jesús M. González-Barahona,et al.  Comparison between SLOCs and number of files as size metrics for software evolution analysis , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[14]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.

[15]  Wladyslaw M. Turski The Reference Model for Smooth Growth of Software Systems Revisited , 2002, IEEE Trans. Software Eng..

[16]  Jean-Michel Dalle,et al.  The Allocation of Software Development Resources In ‘Open Source’ Production Mode , 2005 .