论文信息 - Judging a commit by its cover; or can a commit message predict build failure?

Judging a commit by its cover; or can a commit message predict build failure?

Developers summarize their changes to code in commit messages. When a message seems “unusual”, however, this puts doubt into the quality of the code contained in the commit. We trained n-gram language models and used cross-entropy as an indicator of commit message “unusualness” of over 120 000 commits from open source projects. Build statuses collected from Travis-CI were used as a proxy for code quality. We then compared the distributions of failed and successful commits with regards to the “unusualness” of their commit message. Our analysis yielded significant results when correlating cross-entropy with build status.

Abram Hindle | Eddie A. Santos | Abram Hindle | E. Santos

[1] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[2] Zhendong Su,et al. On the naturalness of software , 2012, ICSE 2012.

[3] Hridesh Rajan,et al. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4] D. Freedman,et al. On the histogram as a density estimator:L2 theory , 1981 .

[5] James R. Glass,et al. Iterative language model estimation: efficient data structure & algorithms , 2008, INTERSPEECH.

[6] Daniela E. Damian,et al. The promises and perils of mining GitHub , 2009, MSR 2014.

[7] Jonathan I. Maletic,et al. What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.