Estimating Commit Sizes Efficiently

The quantitative analysis of software projects can provide insights that let us better understand open source and other software development projects. An important variable used in the analysis of software projects is the amount of work being contributed, the commit size. Unfortunately, post-facto, the commit size can only be estimated, not measured. This paper presents several algorithms for estimating the commit size. Our performance evaluation shows that simple, straightforward heuristics are superior to the more complex text-analysis-based algorithms. Not only are the heuristics significantly faster to compute, they also deliver more accurate results when estimating commit sizes. Based on this experience, we design and present an algorithm that improves on the heuristics, can be computed equally fast, and is more accurate than any of the prior approaches.

[1]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[2]  Dirk Riehle,et al.  The comment density of open source software code , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[3]  Michael W. Godfrey,et al.  Four Interesting Ways in Which History Can Teach Us About Software , 2004, MSR.

[4]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[5]  Paul Heckel,et al.  A technique for isolating differences between files , 1978, CACM.

[6]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[7]  Dirk Riehle,et al.  Continuous Integration in Open Source Software Development , 2008, OSS.

[8]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[9]  Dirk Riehle,et al.  The Commit Size Distribution of Open Source Software , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[10]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[11]  J. W. Hunt,et al.  An Algorithm for Differential File Comparison , 2008 .

[12]  N. S. Barnett,et al.  Private communication , 1969 .