The ‘as code’ activities: development anti-patterns for infrastructure as code

The ‘as code’ suffix in infrastructure as code (IaC) refers to applying software engineering activities, such as version control, to maintain IaC scripts. Without the application of these activities, defects that can have serious consequences may be introduced in IaC scripts. A systematic investigation of the development anti-patterns for IaC scripts can guide practitioners in identifying activities to avoid defects in IaC scripts. Development anti-patterns are recurring development activities that relate with defective IaC scripts. The goal of this paper is to help practitioners improve the quality of infrastructure as code (IaC) scripts by identifying development activities that relate with defective IaC scripts. We identify development anti-patterns by adopting a mixed-methods approach, where we apply quantitative analysis with 2,138 open source IaC scripts and conduct a survey with 51 practitioners. We observe five development activities to be related with defective IaC scripts from our quantitative analysis. We identify five development anti-patterns namely, ‘boss is not around’, ‘many cooks spoil’, ‘minors are spoiler’, ‘silos’, and ‘unfocused contribution’. Our identified development anti-patterns suggest the importance of ‘as code’ activities in IaC because these activities are related to quality of IaC scripts.

[1]  Laurie A. Williams,et al.  Characterizing Defective Configuration Scripts Used for Continuous Deployment , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[2]  James Turnbull Pulling Strings with Puppet: Automated System Administration Done Right , 2008 .

[3]  ShenXipeng,et al.  Tuning for software analytics , 2016 .

[4]  Frank Elberzhager,et al.  Guiding Testing Activities by Predicting Defect-Prone Parts Using Product and Inspection Metrics , 2012, 2012 38th Euromicro Conference on Software Engineering and Advanced Applications.

[5]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[6]  Laurie A. Williams,et al.  Source Code Properties of Defective Infrastructure as Code Scripts , 2018, Inf. Softw. Technol..

[7]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[8]  Thomas J. Mowbray,et al.  AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis , 1998 .

[9]  Vanda Broughton,et al.  Sage Dictionary of Statistics: A Practical Resource for Students in the Social Sciences , 2005 .

[10]  Kief Morris,et al.  Infrastructure as Code: Managing Servers in the Cloud , 2016 .

[11]  Diomidis Spinellis,et al.  Does Your Configuration Code Smell? , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[12]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[13]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[14]  Georgios Gousios,et al.  How good is your puppet? An empirically defined and validated quality model for puppet , 2018, SANER.

[15]  Gerald M. Weinberg,et al.  Quality Software Management Volume 1: Systems Thinking , 1991 .

[16]  Zachary Munn,et al.  Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach , 2018, BMC Medical Research Methodology.

[17]  Foutse Khomh,et al.  Code Authorship and Fault-proneness of Open-Source Android Applications: An Empirical Study , 2017, PROMISE.

[18]  Chris Parnin,et al.  Gang of Eight: A Defect Taxonomy for Infrastructure as Code Scripts , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[19]  Daniela E. Damian,et al.  Selecting Empirical Methods for Software Engineering Research , 2008, Guide to Advanced Empirical Software Engineering.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[22]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[23]  Laurie A. Williams,et al.  Validating software metrics: A spectrum of philosophies , 2012, TSEM.

[24]  Laurie A. Williams,et al.  What Questions Do Programmers Ask about Configuration as Code? , 2018, 2018 IEEE/ACM 4th International Workshop on Rapid Continuous Software Engineering (RCoSE).

[25]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[26]  Carl J. Huberty,et al.  Applied MANOVA and discriminant analysis , 2006 .

[27]  Shane McIntosh,et al.  Modern Release Engineering in a Nutshell -- Why Researchers Should Care , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[28]  Zhen Ming Jiang,et al.  Characterizing and Detecting Anti-Patterns in the Logging Code , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[29]  Bente Anda,et al.  Experiences from conducting semi-structured interviews in empirical software engineering research , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[30]  Meiyappan Nagappan,et al.  Curating GitHub for engineered software projects , 2017, Empirical Software Engineering.

[31]  Ayse Basar Bener,et al.  Data mining source code for locating software bugs: A case study in telecommunication industry , 2009, Expert Syst. Appl..

[32]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[33]  H. Arksey,et al.  Scoping studies: towards a methodological framework , 2005 .

[34]  Yuriy Brun,et al.  Tortoise: Interactive system configuration repair , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[36]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[37]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[38]  Christian Bird,et al.  Code Reviewing in the Trenches: Challenges and Best Practices , 2018, IEEE Software.

[39]  Ayse Basar Bener,et al.  Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry , 2010, Inf. Softw. Technol..

[40]  Shari Lawrence Pfleeger,et al.  Personal Opinion Surveys , 2008, Guide to Advanced Empirical Software Engineering.

[41]  Arjun Guha,et al.  Rehearsal: a configuration verification tool for puppet , 2015, PLDI.

[42]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[43]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[44]  Paul Hudak,et al.  Modular domain specific languages and tools , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[45]  Ahmed E. Hassan,et al.  Predicting faults using the complexity of code changes , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[46]  Emerson R. Murphy-Hill,et al.  Improving developer participation rates in surveys , 2013, 2013 6th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[47]  Tim Menzies,et al.  Tuning for Software Analytics: is it Really Necessary? , 2016, Inf. Softw. Technol..

[48]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[49]  Premkumar T. Devanbu,et al.  Belief & Evidence in Empirical Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[50]  Chris Parnin,et al.  The Seven Sins: Security Smells in Infrastructure as Code Scripts , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[51]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[52]  TurhanBurak,et al.  Data mining source code for locating software bugs , 2009 .

[53]  Elliot Soloway,et al.  Where the bugs are , 1985, CHI '85.

[54]  Gabriele Bavota,et al.  An empirical study on developer‐related factors characterizing fix‐inducing commits , 2017, J. Softw. Evol. Process..

[55]  Laurie A. Williams,et al.  Secure open source collaboration: an empirical study of linus' law , 2009, CCS.

[56]  Tim Menzies,et al.  Assessing Developer Beliefs: A Reply to "Perceptions, Expectations, and Challenges in Defect Prediction" , 2019, ArXiv.

[57]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[58]  Eelco Visser,et al.  DSL Engineering - Designing, Implementing and Using Domain-Specific Languages , 2013 .

[59]  Vahid Garousi,et al.  Smells in software test code: A survey of knowledge in industry and academia , 2018, J. Syst. Softw..

[60]  Eric Van Wyk,et al.  Attribute Grammar-Based Language Extensions for Java , 2007, ECOOP.

[61]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[62]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[63]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[64]  Stephen Peckham,et al.  Asking the right questions: Scoping studies in the commissioning of research on the organisation and delivery of health services , 2008, Health research policy and systems.

[65]  Bram Adams,et al.  Co-evolution of Infrastructure and Source Code - An Empirical Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[66]  N. Cliff Dominance statistics: Ordinal analyses to answer ordinal questions. , 1993 .

[67]  Ahmed E. Hassan,et al.  Prioritizing the creation of unit tests in legacy software systems , 2011, Softw. Pract. Exp..

[68]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[69]  Shane McIntosh,et al.  Revisiting the Impact of Classification Techniques on the Performance of Defect Prediction Models , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[70]  Robert C. Martin The Clean Coder: A Code of Conduct for Professional Programmers , 2011 .