Towards a better understanding of software evolution: An empirical study on open source software

Software evolution is a fact of life. Over the past thirty years, researchers have proposed hypotheses on how software changes, and provided evidence that both supports and refutes these hypotheses. To paint a clearer image of the software evolution process, we performed an empirical study on long spans in the lifetime of seven open source projects. Our analysis covers 653 official releases, and a combined 69 years of evolution. We first tried to verify Lehman's laws of software evolution. Our findings indicate that several of these laws are confirmed, while the rest can be either confirmed or infirmed depending on the laws' operational definitions. Second, we analyze the growth rate for projects' development and maintenance branches, and the distribution of software changes. We find similarities in the evolution patterns of the programs we studied, which brings us closer to constructing rigorous models for software evolution.

[1]  Meir M. Lehman,et al.  A Model of Large Program Development , 1976, IBM Syst. J..

[2]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[3]  M. J. Lawrence,et al.  An examination of evolution dynamics , 1982, ICSE '82.

[4]  Meir M. Lehman,et al.  Laws of Software Evolution Revisited , 1996, EWSPT.

[5]  M TurskiWładysław,et al.  Reference Model for Smooth Growth of Software Systems , 1996 .

[6]  Harald C. Gall,et al.  Software evolution observations based on product release history , 1997, 1997 Proceedings International Conference on Software Maintenance.

[7]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[8]  Dewayne E. Perry,et al.  On evidence supporting the FEAST hypothesis and the laws of software evolution , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[9]  Chris F. Kemerer,et al.  An Empirical Approach to Studying Software Evolution , 1999, IEEE Trans. Software Eng..

[10]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[11]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[12]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.

[13]  Meir M. Lehman,et al.  Rules and Tools for Software Evolution Planning and Management , 2001, Ann. Softw. Eng..

[14]  Kouichi Kishida,et al.  Evolution patterns of open-source software systems and communities , 2002, IWPSE '02.

[15]  Dewayne E. Perry,et al.  Classification and evaluation of defects in a project retrospective , 2002, J. Syst. Softw..

[16]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[17]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[18]  Wladyslaw M. Turski The Reference Model for Smooth Growth of Software Systems Revisited , 2002, IEEE Trans. Software Eng..

[19]  Walt Scacchi,et al.  Understanding Open Source Software Evolution: Applying, Breaking, and Rethinking the Laws of Software Evolution , 2003 .

[20]  Grace A. Lewis,et al.  Modernizing Legacy Systems - Software Technologies, Engineering Processes, and Business Practices , 2003, SEI series in software engineering.

[21]  Daniel M. Germán,et al.  Using software trails to reconstruct the evolution of software , 2004, J. Softw. Maintenance Res. Pract..

[22]  Richard C. Holt,et al.  Linker-Based Program Extraction and Its Uses in Studying Software Evolution , 2004 .

[23]  David Leon,et al.  Dex: a semantic-graph differencing tool for studying changes in large code bases , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[24]  Giancarlo Succi,et al.  An empirical study of open-source and closed-source software products , 2004, IEEE Transactions on Software Engineering.

[25]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[26]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[27]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[28]  Jeffrey S. Foster,et al.  Understanding source code evolution using abstract syntax tree matching , 2005, MSR.

[29]  Sunghun Kim,et al.  Properties of Signature Change Patterns , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[30]  Manuel Oriol,et al.  Practical dynamic software updating for C , 2006, PLDI '06.

[31]  James M. Bieman,et al.  The evolution of FreeBSD and linux , 2006, ISESE '06.

[32]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[33]  Daniel M. Germán,et al.  On the prediction of the evolution of libre software projects , 2007, 2007 IEEE International Conference on Software Maintenance.

[34]  Jesús M. González-Barahona,et al.  Towards a Theoretical Model for Software Growth , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[35]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[36]  Yann-Gaël Guéhéneuc,et al.  Mining the Lexicon Used by Programmers during Sofware Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[37]  Jesús M. González-Barahona,et al.  Determinism and evolution , 2008, MSR '08.

[38]  M. Wermelinger,et al.  Empirical Studies of Open Source Evolution , 2008, Software Evolution.

[39]  Israel Herraiz A statistical examination of the evolution and properties of libre software , 2009, 2009 IEEE International Conference on Software Maintenance.

[40]  Tom Mens,et al.  What Does It Take to Develop a Million Lines of Open Source Code? , 2009, OSS.

[41]  Carlo Ghezzi,et al.  An empirical investigation into a large-scale Java open source code repository , 2010, ESEM '10.

[42]  Iulian Neamtiu,et al.  Studying Software Evolution for Taming Software Complexity , 2010, 2010 21st Australian Software Engineering Conference.