How Often does a Source Code Unit Change within a Release Window?

To form a training set for a source-code change prediction model, e.g., using the association rule mining or machine learning techniques, commits from the source code history are needed. The traceability between releases and commits would facilitate a systematic choice of history in units of the project evolution scale (i.e., commits that constitute a software release). For example, the major release 25.0 in Chrome is mapped to the earliest revision 157687 and latest revision 165096 in the trunk. Using this traceability, an empirical study is reported on the frequency distribution of file changes for different release windows. In Chrome, the majority (50%) of the committed files change only once between a pair of consecutive releases. This trend is reversed after expanding the window size to at least 10. That is, the majority (50%) of the files change multiple times when commits constituting 10 or greater releases are considered. These results suggest that a training set of at least 10 releases is needed to provide a prediction coverage for majority of the files.

[1]  Hareton K. N. Leung,et al.  A survey of code‐based change impact analysis techniques , 2013, Softw. Test. Verification Reliab..

[2]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[3]  Harry M. Sneed,et al.  A cost model for software maintenance & evolution , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[4]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[5]  Jan Bosch,et al.  Observations on the evolution of an industrial OO framework , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[6]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[7]  Günther Ruhe,et al.  Supporting Software Release Planning Decisions for Evolving Systems , 2005, 29th Annual IEEE/NASA Software Engineering Workshop.

[8]  Huzefa H. Kagdi,et al.  On mapping releases to commits in open source systems , 2014, ICPC 2014.

[9]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[10]  Richard C. Holt,et al.  Information theoretic evaluation of change prediction models for large-scale software , 2006, MSR '06.

[11]  Michael W. Godfrey,et al.  Automatic classication of large changes into maintenance categories , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[12]  Jens von Pilgrim,et al.  A survey of traceability in requirements engineering and model-driven development , 2010, Software & Systems Modeling.

[13]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[14]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[15]  Iulian Neamtiu,et al.  Towards a better understanding of software evolution: An empirical study on open source software , 2009, 2009 IEEE International Conference on Software Maintenance.

[16]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[17]  Denys Poshyvanyk,et al.  SE2 model to support software evolution , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[18]  I MaleticJonathan,et al.  A survey and taxonomy of approaches for mining software repositories in the context of software evolution , 2007 .

[19]  Harald C. Gall,et al.  Software evolution observations based on product release history , 1997, 1997 Proceedings International Conference on Software Maintenance.

[20]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[21]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[22]  Harald C. Gall,et al.  Tracking concept drift of software projects using defect prediction quality , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[23]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[24]  Richard C. Holt,et al.  Studying the chaos of code development , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[25]  Gerardo Canfora,et al.  Impact analysis by mining software and change request repositories , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[26]  Robert S. Arnold,et al.  Software Change Impact Analysis , 1996 .

[27]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.