What's a Typical Commit? A Characterization of Open Source Software Repositories

The research examines the version histories of nine open source software systems to uncover trends and characteristics of how developers commit source code to version control systems (e.g., subversion). The goal is to characterize what a typical or normal commit looks like with respect to the number of files, number of lines, and number of hunks committed together. The results of these three characteristics are presented and the commits are categorized from extra small to extra large. The findings show that approximately 75% of commits are quite small for the systems examined along all three characteristics. Additionally, the commit messages are examined along with the characteristics. The most common words are extracted from the commit messages and correlated with the size categories of the commits. It is observed that sized categories can be indicative of the types of maintenance activities being performed.

[1]  Dewayne E. Perry,et al.  Towards Understanding the Rhetoric of Small Changes , 2004, MSR.

[2]  Dewayne E. Perry,et al.  Toward understanding the rhetoric of small source code changes , 2005, IEEE Transactions on Software Engineering.

[3]  Gerardo Canfora,et al.  Fine grained indexing of software repositories to support impact analysis , 2006, MSR '06.

[4]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[5]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[6]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[7]  Jesús M. González-Barahona,et al.  Mining large software compilations over time: another perspective of software evolution , 2006, MSR '06.

[8]  Michael W. Godfrey,et al.  Release Pattern Discovery via Partitioning: Methodology and Case Study , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[9]  James M. Bieman,et al.  The FreeBSD project: a replication case study of open source development , 2005, IEEE Transactions on Software Engineering.

[10]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[11]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[14]  Daniel German,et al.  Mining CVS repositories, the softChange experience , 2004, MSR.