BUILDFAST: History-Aware Build Outcome Prediction for Fast Feedback and Reduced Cost in Continuous Integration

Long build times in continuous integration (CI) can greatly increase the cost in human and computing resources, and thus become a common barrier faced by software organizations adopting CI. Build outcome prediction has been proposed as one of the remedies to reduce such cost. However, the state-of-the-art approaches have a poor prediction performance for failed builds, and are not designed for practical usage scenarios. To address the problems, we first conduct an empirical study on 2,590,917 builds to characterize build times in realworld projects, and a survey with 75 developers to understand their perceptions about build outcome prediction. Then, motivated by our study and survey results, we propose a new history-aware approach, named BUILDFAST, to predict CI build outcomes cost-efficiently and practically. We develop multiple failure-specific features from closely related historical builds via analyzing build logs and changed files, and propose an adaptive prediction model to switch between two models based on the build outcome of the previous build. We investigate a practical online usage scenario of BUILDFAST, where builds are predicted in chronological order, and measure the benefit from correct predictions and the cost from incorrect predictions. Our experiments on 20 projects have shown that BUILDFAST improved the state-of-the-art by 47.5% in F1-score for failed builds.

[1]  Taher Ahmed Ghaleb,et al.  An empirical study of the long duration of continuous integration builds , 2019, Empirical Software Engineering.

[2]  Mark Harman,et al.  Faster Fault Finding at Google Using Multi Objective Regression Test Optimisation , 2011 .

[3]  Schahram Dustdar,et al.  Improving cloud-based continuous integration environments , 2015, ICSE 2015.

[4]  Harald C. Gall,et al.  Automated Reporting of Anti-Patterns and Decay in Continuous Integration , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[5]  Lech Madeyski,et al.  Continuous Defect Prediction: The Idea and a Related Dataset , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[6]  Xin Peng,et al.  A large-scale empirical study of compiler errors in continuous integration , 2019, ESEC/SIGSOFT FSE.

[7]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[8]  Olga Baysal,et al.  Built to Last or Built Too Fast? Evaluating Prediction Models for Build Times , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[9]  Jing Xia,et al.  Could We Predict the Result of a Continuous Integration Build? An Empirical Study , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[10]  Daniel Alencar da Costa,et al.  An Empirical Study of the Relationship between Continuous Integration and Test Code Evolution , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[11]  Tao Xie,et al.  Learning for test prioritization: an industrial case study , 2016, SIGSOFT FSE.

[12]  Hitesh Sajnani,et al.  Towards Predicting the Impact of Software Changes on Building Activities , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER).

[13]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[14]  Ming Li,et al.  Poster: ACONA: Active Online Model Adaptation for Predicting Continuous Integration Build Failures , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[15]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[16]  Rabe Abdalkareem,et al.  Which Commits Can Be CI Skipped? , 2021, IEEE Transactions on Software Engineering.

[17]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[18]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[19]  Andy Zaidman,et al.  A Tale of CI Build Failures: An Open Source and a Financial Organization Perspective , 2017, ICSME.

[20]  Russel Pears,et al.  Data stream mining for predicting software build outcomes using source code metrics , 2014, Inf. Softw. Technol..

[21]  Shane McIntosh,et al.  Mining Co-change Information to Understand When Build Changes Are Necessary , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[22]  Ying Wang,et al.  ClDiff: Generating Concise Linked Code Differences , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[23]  Sashank Dara,et al.  Online Defect Prediction for Imbalanced Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[24]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[25]  Robert W. Bowdidge,et al.  Programmers' build errors: a case study (at google) , 2014, ICSE.

[26]  Reid Holmes,et al.  Measuring the cost of regression testing in practice: a study of Java projects using continuous integration , 2017, ESEC/SIGSOFT FSE.

[27]  Ming Li,et al.  Cost-Effective Build Outcome Prediction Using Cascaded Classifiers , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[28]  Daniela E. Damian,et al.  Does Socio-Technical Congruence Have an Effect on Software Build Success? A Study of Coordination in a Software Project , 2011, IEEE Transactions on Software Engineering.

[29]  Gerardo Canfora,et al.  How Open Source Projects Use Static Code Analysis Tools in Continuous Integration Pipelines , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[30]  Philipp Leitner,et al.  An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[31]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Foutse Khomh,et al.  Why Do Automated Builds Break? An Empirical Study , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[33]  Georgios Gousios,et al.  TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[34]  Adrian Schröter Predicting build outcome with developer interaction in Jazz , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[35]  Ming Li,et al.  Cutting the Software Building Efforts in Continuous Integration by Semi-Supervised Online AUC Optimization , 2018, IJCAI.

[36]  Francisco Servant,et al.  A Cost-efficient Approach to Building in Continuous Integration , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[37]  Rabe Abdalkareem,et al.  A Machine Learning Approach to Improve the Detection of CI Skip Commits , 2021, IEEE Transactions on Software Engineering.

[38]  Gregg Rothermel,et al.  Redefining Prioritization: Continuous Prioritization for Continuous Integration , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[39]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[40]  Foyzul Hassan,et al.  Change-Aware Build Prediction Model for Stall Avoidance in Continuous Integration , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[41]  Shane McIntosh,et al.  Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  David Lo,et al.  Perceptions, Expectations, and Challenges in Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[43]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[44]  Arnaud Gotlieb,et al.  Test Case Prioritization for Continuous Regression Testing: An Industrial Case Study , 2013, 2013 IEEE International Conference on Software Maintenance.

[45]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[46]  Nikolaj Bjørner,et al.  Optimizing Test Placement for Module-Level Regression Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[47]  Minhaz Fahim Zibran,et al.  Insights into Continuous Integration Build Failures , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[48]  Shane McIntosh,et al.  Predicting Build Co-changes with Source Code Change and Commit Categories , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[49]  John Micco,et al.  Taming Google-Scale Continuous Testing , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[50]  Ahmet Çelik,et al.  Build system with lazy retrieval for Java projects , 2016, SIGSOFT FSE.

[51]  David Lo,et al.  Cross-project build co-change prediction , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[52]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[53]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[54]  Terry L King A Guide to Chi-Squared Testing , 1997 .

[55]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[56]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).