Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI

Automated builds, which may pass or fail, provide feedback to a development team about changes to the codebase. A passing build indicates that the change compiles cleanly and tests (continue to) pass. A failing (a.k.a., broken) build indicates that there are issues that require attention. Without a closer analysis of the nature of build outcome data, practitioners and researchers are likely to make two critical assumptions: (1) build results are not noisy; however, passing builds may contain failing or skipped jobs that are actively or passively ignored; and (2) builds are equal; however, builds vary in terms of the number of jobs and configurations. To investigate the degree to which these assumptions about build breakage hold, we perform an empirical study of 3.7 million build jobs spanning 1,276 open source projects. We find that: (1) 12% of passing builds have an actively ignored failure; (2) 9% of builds have a misleading or incorrect outcome on average; and (3) at least 44% of the broken builds contain passing jobs, i.e., the breakage is local to a subset of build variants. Like other software archives, build data is noisy and complex. Analysis of build data requires nuance.

[1]  Foyzul Hassan,et al.  Automatic Building of Java Projects in Software Repositories: A Study on Feasibility and Challenges , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[2]  Andy Zaidman,et al.  A Tale of CI Build Failures: An Open Source and a Financial Organization Perspective , 2017, ICSME.

[3]  Andy Zaidman,et al.  Analyzing the State of Static Analysis: A Large-Scale Evaluation in Open Source Software , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[4]  Peter Kampstra,et al.  Beanplot: A Boxplot Alternative for Visual Comparison of Distributions , 2008 .

[5]  Gabriele Bavota,et al.  There and back again: Can you compile that snapshot? , 2017, J. Softw. Evol. Process..

[6]  Robert W. Bowdidge,et al.  Programmers' build errors: a case study (at google) , 2014, ICSE.

[7]  Daniela E. Damian,et al.  Does Socio-Technical Congruence Have an Effect on Software Build Success? A Study of Coordination in a Software Project , 2011, IEEE Transactions on Software Engineering.

[8]  Yann-Gaël Guéhéneuc,et al.  Do Not Trust Build Results at Face Value - An Empirical Study of 30 Million CPAN Builds , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[9]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[10]  Alexander Serebrenik,et al.  Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[11]  Shane McIntosh,et al.  Modern Release Engineering in a Nutshell -- Why Researchers Should Care , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[12]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[13]  Audris Mockus,et al.  A Large-Scale Empirical Study of the Relationship between Build Technology and Build Maintenance , 2014, Empirical Software Engineering.

[14]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[15]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[16]  Philipp Leitner,et al.  An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[17]  Gerardo Canfora,et al.  How Open Source Projects Use Static Code Analysis Tools in Continuous Integration Pipelines , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[18]  E. Burton Swanson,et al.  The dimensions of maintenance , 1976, ICSE '76.

[19]  Shane McIntosh,et al.  Automatically repairing dependency-related build breakage , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[20]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[21]  Tijs van der Storm,et al.  Backtracking Incremental Continuous Integration , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[22]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Foutse Khomh,et al.  Why Do Automated Builds Break? An Empirical Study , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[24]  Lin Chen,et al.  What are the Factors Impacting Build Breakage? , 2017, 2017 14th Web Information Systems and Applications Conference (WISA).

[25]  Beryl Plimmer,et al.  Ambient awareness of build status in collocated software teams , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[26]  Audris Mockus,et al.  Software Support Tools and Experimental Work , 2006, Empirical Software Engineering Issues.

[27]  Harald C. Gall,et al.  Un-break My Build: Assisting Developers with Build Repair Hints , 2018, 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC).

[28]  Davor Svetinovic,et al.  Continuous integration build breakage rationale: Travis data case study , 2017, 2017 International Conference on Infocom Technologies and Unmanned Systems (Trends and Future Directions) (ICTUS).

[29]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.