Use and Misuse of Continuous Integration Features: An Empirical Study of Projects That (Mis)Use Travis CI

Continuous Integration (CI) is a popular practice where software systems are automatically compiled and tested as changes appear in the version control system of a project. Like other software artifacts, CI specifications require maintenance effort. Although there are several service providers like Travis CI offering various CI features, it is unclear which features are being (mis)used. In this paper, we present a study of feature use and misuse in 9,312 open source systems that use Travis CI. Analysis of the features that are adopted by projects reveals that explicit deployment code is rare—48.16 percent of the studied Travis CI specification code is instead associated with configuring job processing nodes. To analyze feature misuse, we propose Hansel—an anti-pattern detection tool for Travis CI specifications. We define four anti-patterns and Hansel detects anti-patterns in the Travis CI specifications of 894 projects in the corpus (9.60 percent), and achieves a recall of 82.76 percent in a sample of 100 projects. Furthermore, we propose Gretel—an anti-pattern removal tool for Travis CI specifications, which can remove 69.60 percent of the most frequently occurring anti-pattern automatically. Using Gretel, we have produced 36 accepted pull requests that remove Travis CI anti-patterns automatically.

[1]  Nenad Medvidovic,et al.  Toward a Catalogue of Architectural Bad Smells , 2009, QoSA.

[2]  Fabrizio Cannizzo,et al.  Pushing the Boundaries of Testing and Continuous Integration , 2008, Agile 2008 Conference.

[3]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[4]  Woo Jin Lee,et al.  Developer Mistakes in Writing Android Manifests: An Empirical Study of Configuration Errors , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[5]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[6]  Mathias Meyer,et al.  Continuous Integration and Its Tools , 2014, IEEE Software.

[7]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Harald C. Gall,et al.  An Empirical Analysis of the Docker Container Ecosystem on GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[9]  Casper Lassenius,et al.  Problems, causes and solutions when adopting continuous delivery - A systematic literature review , 2017, Inf. Softw. Technol..

[10]  Angelos D. Keromytis,et al.  DoubleCheck: Multi-path verification against man-in-the-middle attacks , 2009, 2009 IEEE Symposium on Computers and Communications.

[11]  Laurie A. Williams,et al.  Characterizing Defective Configuration Scripts Used for Continuous Deployment , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[12]  Yann-Gaël Guéhéneuc,et al.  DECOR: A Method for the Specification and Detection of Code and Design Smells , 2010, IEEE Transactions on Software Engineering.

[13]  Georgios Gousios,et al.  How good is your puppet? An empirically defined and validated quality model for puppet , 2018, SANER.

[14]  Shane McIntosh,et al.  Modern Release Engineering in a Nutshell -- Why Researchers Should Care , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[15]  Rui Abreu,et al.  Continuous test generation: enhancing continuous integration with automated test generation , 2014, ASE.

[16]  Shane McIntosh,et al.  The evolution of Java build systems , 2012, Empirical Software Engineering.

[17]  William J. Brown,et al.  AntiPatterns and Patterns in Software Configuration Management , 1999 .

[18]  Gregg Rothermel,et al.  Techniques for improving regression testing in continuous integration development environments , 2014, SIGSOFT FSE.

[19]  A. Huberman,et al.  Qualitative Data Analysis: A Methods Sourcebook , 1994 .

[20]  Mika Mäntylä,et al.  The highways and country roads to continuous deployment , 2015, IEEE Software.

[21]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[22]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[23]  Georgios Gousios,et al.  Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[24]  Diomidis Spinellis,et al.  Does Your Configuration Code Smell? , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[25]  Premkumar T. Devanbu,et al.  Quality and productivity outcomes relating to continuous integration in GitHub , 2015, ESEC/SIGSOFT FSE.

[26]  A. Scott,et al.  A Cluster Analysis Method for Grouping Means in the Analysis of Variance , 1974 .

[27]  Stefan Biffl,et al.  Communicating continuous integration servers for increasing effectiveness of automated testing , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[28]  Jan Bosch,et al.  Modeling continuous integration practice differences in industry software development , 2014, J. Syst. Softw..

[29]  Wolfgang De Meuter,et al.  The Evolution of the Linux Build System , 2007, Electron. Commun. Eur. Assoc. Softw. Sci. Technol..

[30]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[31]  Arjun Guha,et al.  Rehearsal: a configuration verification tool for puppet , 2015, PLDI.

[32]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..