An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software

Continuous Integration (CI) has become a common practice in both industrial and open-source software development. While CI has evidently improved aspects of the software development process, errors during CI builds pose a threat to development efficiency. As an increasing amount of time goes into fixing such errors, failing builds can significantly impair the development process and become very costly. We perform an indepth analysis of build failures in CI environments. Our approach links repository commits to data of corresponding CI builds. Using data from 14 open-source Java projects, we first identify 14 common error categories. Besides test failures, which are by far the most common error category (up to >80% per project), we also identify noisy build data, e.g., induced by transient Git interaction errors, or general infrastructure flakiness. Second, we analyze which factors impact the build results, taking into account general process and specific CI metrics. Our results indicate that process metrics have a significant impact on the build outcome in 8 of the 14 projects on average, but the strongest influencing factor across all projects is overall stability in the recent build history. For 10 projects, more than 50% (up to 80%) of all failed builds follow a previous build failure. Moreover, the fail ratio of the last k=10 builds has a significant impact on build results for all projects in our dataset.

[1]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[2]  Daniel M. Germán,et al.  Continuously mining distributed version control systems: an empirical study of how Linux uses Git , 2014, Empirical Software Engineering.

[3]  Lech Madeyski,et al.  Which process metrics can significantly improve defect prediction models? An empirical study , 2014, Software Quality Journal.

[4]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[5]  Andy Zaidman,et al.  Continuous Delivery Practices in a Large Financial Organization , 2016, ICSME.

[6]  A. Strauss,et al.  Basics of Qualitative Research , 1992 .

[7]  Danny Dig,et al.  How do centralized and distributed version control systems impact software changes? , 2014, ICSE.

[8]  Christian Bird,et al.  The effect of branching strategies on software quality , 2012, Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement.

[9]  Foutse Khomh,et al.  Why Do Automated Builds Break? An Empirical Study , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[10]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[11]  Georgios Gousios,et al.  GHTorrent: Github's data from a firehose , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[12]  Steve Neely,et al.  Continuous Delivery? Easy! Just Change Everything (Well, Maybe It Is Not That Easy) , 2013, 2013 Agile Conference.

[13]  Jan Bosch,et al.  Modeling continuous integration practice differences in industry software development , 2014, J. Syst. Softw..

[14]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[15]  Schahram Dustdar,et al.  Asserting reliable convergence for configuration management scripts , 2016, OOPSLA.

[16]  Pasi Kuvaja,et al.  Continuous deployment of software intensive products and services: A systematic mapping study , 2017, J. Syst. Softw..

[17]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[18]  James D. Herbsleb,et al.  Factors leading to integration failures in global feature-oriented development: an empirical analysis , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[19]  Georgios Gousios,et al.  Work practices and challenges in pull-based development: the contributor's perspective , 2015, ICSE.

[20]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[21]  Robert W. Bowdidge,et al.  Programmers' build errors: a case study (at google) , 2014, ICSE.

[22]  Daniela E. Damian,et al.  Does Socio-Technical Congruence Have an Effect on Software Build Success? A Study of Coordination in a Software Project , 2011, IEEE Transactions on Software Engineering.

[23]  Lin Tan,et al.  Do time of day and developer experience affect commit bugginess? , 2011, MSR '11.

[24]  Harald C. Gall,et al.  An empirical study on principles and practices of continuous delivery and deployment , 2016, PeerJ Prepr..

[25]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[26]  Shane McIntosh,et al.  Predicting Build Co-changes with Source Code Change and Commit Categories , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[27]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[28]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[29]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[30]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[31]  Shane McIntosh,et al.  The evolution of Java build systems , 2012, Empirical Software Engineering.

[32]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[33]  Premkumar T. Devanbu,et al.  Wait for It: Determinants of Pull Request Evaluation Latency on GitHub , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[34]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.