Analyzing test driven development based on GitHub evidence

Testing is an integral part of the software- development lifecycle, approached with varying degrees of rigor by different process models. Agile process models advocate Test Driven Development (TDD) as one among their key practices for reducing costs and improving code quality. In this paper we comparatively analyze GitHub repositories that adopt TDD against repositories that do not, in order to determine how TDD affects a number of variables related to productivity and developer satisfaction, two aspects that should be considered in a cost-benefit analysis of the paradigm. In this study, we searched through GitHub and found that a relatively small subset of Java-based repositories can be seen to adopt TDD, and an even smaller subset can be confidently identified as rigorously adhering to TDD. For comparison pur- poses, we created two same-size control sets of repositories. We then compared the repositories in these two sets in terms of number of test files, average commit velocity, number of commits that reference bugs, number of issues recorded, whether they use continuous integration, and the sentiment of their developers' commits. We found some interesting and significant differences between the two sets, including higher commit velocity and increased likelihood of continuous integration for TDD repositories.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Abram Hindle,et al.  Judging a commit by its cover; or can a commit message predict build failure? , 2016, PeerJ Prepr..

[3]  Arie van Deursen,et al.  Mining Software Repositories to Study Co-Evolution of Production & Test Code , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[4]  Alexander Serebrenik,et al.  Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[5]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[6]  Bram Adams,et al.  Do developers feel emotions? an exploratory analysis of emotions in software artifacts , 2014, MSR 2014.

[7]  Georgios Gousios,et al.  When, how, and why developers (do not) test in their IDEs , 2015, ESEC/SIGSOFT FSE.

[8]  Eleni Stroulia,et al.  On the Personality Traits of StackOverflow Users , 2013, 2013 IEEE International Conference on Software Maintenance.

[9]  Andy Zaidman,et al.  Test Code Quality and Its Relation to Issue Handling Performance , 2014, IEEE Transactions on Software Engineering.

[10]  Georgios Gousios,et al.  How (Much) Do Developers Test? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Michael W. Godfrey,et al.  Release Pattern Discovery via Partitioning: Methodology and Case Study , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  Yang Li,et al.  Sentiment analysis of commit comments in GitHub: an empirical study , 2014, MSR 2014.

[13]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[14]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[15]  Kent L. Beck,et al.  Test-driven Development - by example , 2002, The Addison-Wesley signature series.

[16]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .