Gang of Eight: A Defect Taxonomy for Infrastructure as Code Scripts

Defects in infrastructure as code (IaC) scripts can have serious consequences, for example, creating large-scale system outages. A taxonomy of IaC defects can be useful for understanding the nature of defects, and identifying activities needed to fix and prevent defects in IaC scripts. The goal of this paper is to help practitioners improve the quality of infrastructure as code (IaC) scripts by developing a defect taxonomy for IaC scripts through qualitative analysis. We develop a taxonomy of IaC defects by applying qualitative analysis on 1,448 defect-related commits collected from open source software (OSS) repositories of the Openstack organization. We conduct a survey with 66 practitioners to assess if they agree with the identified defect categories included in our taxonomy. We quantify the frequency of identified defect categories by analyzing 80,425 commits collected from 291 OSS repositories spanning across 2005 to 2019. Our defect taxonomy for IaC consists of eight categories, including a category specific to IaC called idempotency (i.e., defects that lead to incorrect system provisioning when the same IaC script is executed multiple times). We observe the surveyed 66 practitioners to agree most with idempotency. The most frequent defect category is configuration data i.e., providing erroneous configuration data in IaC scripts. Our taxonomy and the quantified frequency of the defect categories may help in advancing the science of IaC script quality.

[1]  Tim Menzies,et al.  What is the Connection Between Issues, Bugs, and Enhancements? , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[2]  Andy Zaidman,et al.  Not all bugs are the same: Understanding, characterizing, and classifying bug types , 2019, J. Syst. Softw..

[3]  Georgios Gousios,et al.  How good is your puppet? An empirically defined and validated quality model for puppet , 2018, SANER.

[4]  Gabriele Bavota,et al.  To Distribute or Not to Distribute? Why Licensing Bugs Matter , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[5]  John Allspaw,et al.  How Complex Systems Fail , 2010, Web Operations.

[6]  J Allan,et al.  Readings in information retrieval. , 1998 .

[7]  Yuanyuan Zhou,et al.  Bug characteristics in open source software , 2013, Empirical Software Engineering.

[8]  Laurie A. Williams,et al.  Source Code Properties of Defective Infrastructure as Code Scripts , 2018, Inf. Softw. Technol..

[9]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[10]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[11]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[12]  Yuriy Brun,et al.  Tortoise: Interactive system configuration repair , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  David Lo,et al.  Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[14]  Florian Rosenberg,et al.  Testing Idempotence for Infrastructure as Code , 2013, Middleware.

[15]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[16]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2016, PeerJ Prepr..

[17]  Bram Adams,et al.  Co-evolution of Infrastructure and Source Code - An Empirical Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[18]  Schahram Dustdar,et al.  Asserting reliable convergence for configuration management scripts , 2016, OOPSLA.

[19]  Laurie A. Williams,et al.  Characterizing Defective Configuration Scripts Used for Continuous Deployment , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[20]  Anthony Di Franco,et al.  A comprehensive study of real-world numerical bug characteristics , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[21]  Arjun Guha,et al.  Rehearsal: a configuration verification tool for puppet , 2015, PLDI.

[22]  LiGuo Huang,et al.  AutoODC: Automated generation of orthogonal defect classifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[23]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[24]  DustdarSchahram,et al.  Asserting reliable convergence for configuration management scripts , 2016 .

[25]  Diomidis Spinellis,et al.  Does Your Configuration Code Smell? , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[26]  Tim Menzies,et al.  We Don't Need Another Hero?: The Impact of "Heroes" on Software Development , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[27]  Stefano Russo,et al.  Detection of Software Failures through Event Logs: An Experimental Study , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[28]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[29]  Gabriele Bavota,et al.  An Empirical Study on Android-Related Vulnerabilities , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[30]  Bryan Jurish,et al.  Word and Sentence Tokenization with Hidden Markov Models , 2013, J. Lang. Technol. Comput. Linguistics.

[31]  Chen Feng,et al.  Towards understanding bugs in an open source cloud management stack: An empirical study of OpenStack software bugs , 2019, J. Syst. Softw..

[32]  Bruno Legeard,et al.  A taxonomy of model‐based testing approaches , 2012, Softw. Test. Verification Reliab..

[33]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[34]  Mark Burgess Testable System Administration , 2011 .

[35]  Laurie A. Williams,et al.  What Questions Do Programmers Ask about Configuration as Code? , 2018, 2018 IEEE/ACM 4th International Workshop on Rapid Continuous Software Engineering (RCoSE).

[36]  Premkumar T. Devanbu,et al.  On the "naturalness" of buggy code , 2015, ICSE.

[37]  Horst Lichter,et al.  Code Smells in Infrastructure as Code , 2018, 2018 11th International Conference on the Quality of Information and Communications Technology (QUATIC).

[38]  Chris Parnin,et al.  The Seven Sins: Security Smells in Infrastructure as Code Scripts , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[39]  Massimiliano Di Penta,et al.  How Is Video Game Development Different from Software Development in Open Source? , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[40]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[41]  David Lo,et al.  An Empirical Study of Bugs in Software Build Systems , 2013, 2013 13th International Conference on Quality Software.

[42]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[43]  Joseph L. Fleiss,et al.  Balanced Incomplete Block Designs for Inter-Rater Reliability Studies , 1981 .

[44]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[45]  Emerson R. Murphy-Hill,et al.  Improving developer participation rates in surveys , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[46]  Domenico Cotroneo,et al.  Testing techniques selection based on ODC fault types and software metrics , 2013, J. Syst. Softw..

[47]  Tim Menzies,et al.  Assessing Developer Beliefs: A Reply to "Perceptions, Expectations, and Challenges in Defect Prediction" , 2019, ArXiv.

[48]  Shari Lawrence Pfleeger,et al.  Personal Opinion Surveys , 2008, Guide to Advanced Empirical Software Engineering.

[49]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[50]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[51]  Amritanshu Agrawal,et al.  Characterizing the influence of continuous integration: empirical results from 250+ open source and proprietary projects , 2017, SWAN@ESEC/SIGSOFT FSE.

[52]  Baowen Xu,et al.  How Practitioners Perceive Automated Bug Report Management Techniques , 2020, IEEE Transactions on Software Engineering.

[53]  Sidney W A Dekker,et al.  Reconstructing human contributions to accidents: the new view on error and performance. , 2002, Journal of safety research.

[54]  Domenico Cotroneo,et al.  Assessing Direct Monitoring Techniques to Analyze Failures of Critical Industrial Systems , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[55]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .