An Evolutionary Study of Configuration Design and Implementation in Cloud Systems

Many techniques were proposed for detecting software misconfigurations in cloud systems and for diagnosing unintended behavior caused by such misconfigurations. Detection and diagnosis are steps in the right direction: misconfigurations cause many costly failures and severe performance issues. But, we argue that continued focus on detection and diagnosis is symptomatic of a more serious problem: configuration design and implementation are not yet first-class software engineering endeavors in cloud systems. Little is known about how and why developers evolve configuration design and implementation, and the challenges that they face in doing so. This paper presents a source-code level study of the evolution of configuration design and implementation in cloud systems. Our goal is to understand the rationale and developer practices for revising initial configuration design/implementation decisions, especially in response to consequences of misconfigurations. To this end, we studied 1178 configuration-related commits from a 2.5 year version-control history of four large-scale, actively-maintained open-source cloud systems (HDFS, HBase, Spark, and Cassandra). We derive new insights into the software configuration engineering process. Our results motivate new techniques for proactively reducing misconfigurations by improving the configuration design and implementation process in cloud systems. We highlight a number of future research directions.

[1]  Norbert Siegmund,et al.  Transfer learning for performance modeling of configurable systems: An exploratory analysis , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Myra B. Cohen,et al.  Users beware: preference inconsistencies ahead , 2015, ESEC/SIGSOFT FSE.

[3]  Krzysztof Czarnecki,et al.  Variability modeling in the real: a perspective from the operating systems domain , 2010, ASE '10.

[4]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[5]  Michael D. Ernst,et al.  Automated diagnosis of software configuration errors , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[6]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[7]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[8]  Christian Kästner,et al.  Learning to sample: exploiting similarities across environments to learn performance models for configurable systems , 2018, ESEC/SIGSOFT FSE.

[9]  Sven Apel,et al.  A Study of Feature Scattering in the Linux Kernel , 2021, IEEE Transactions on Software Engineering.

[10]  Krzysztof Czarnecki,et al.  Evolution of the Linux Kernel Variability Model , 2010, SPLC.

[11]  Shanshan Li,et al.  An Evolutionary Study of Configuration Design and Implementation in Cloud Systems , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[12]  Yuanyuan Zhou,et al.  Do not blame users for misconfigurations , 2013, SOSP.

[13]  Salman Baset,et al.  Usable declarative configuration specification and validation for applications, systems, and cloud , 2017, Middleware '17.

[14]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Yang Liu,et al.  Be conservative: enhancing failure diagnosis with proactive logging , 2012, OSDI 2012.

[16]  Shu Wang,et al.  Understanding and Auto-Adjusting Performance-Sensitive Configurations , 2018, ASPLOS.

[17]  Adam A. Porter,et al.  iTree: Efficiently Discovering High-Coverage Configurations Using Interaction Trees , 2014, IEEE Transactions on Software Engineering.

[18]  Takayuki Osogami,et al.  Finding probably better system configurations quickly , 2006, SIGMETRICS '06/Performance '06.

[19]  Uirá Kulesza,et al.  Studying the Impact of Adopting Continuous Integration on the Delivery Time of Pull Requests , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[20]  Ruzica Piskac,et al.  Synthesizing configuration file specifications with association rule learning , 2017, Proc. ACM Program. Lang..

[21]  Zhendong Su,et al.  Finding and Analyzing Compiler Warning Defects , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[22]  Haryadi S. Gunawi,et al.  Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.

[23]  Jacob Krüger,et al.  Principles of feature modeling , 2019, ESEC/SIGSOFT FSE.

[24]  Ruzica Piskac,et al.  Probabilistic Automated Language Learning for Configuration Files , 2016, CAV.

[25]  Zhendong Su,et al.  Detecting API documentation errors , 2013, OOPSLA.

[26]  Tianyin Xu,et al.  How Do System Administrators Resolve Access-Denied Issues in the Real World? , 2017, CHI.

[27]  Eser Kandogan,et al.  Field studies of computer system administrators: analysis of system management tools and practices , 2004, CSCW.

[28]  Michael D. Ernst,et al.  Which configuration option should I change? , 2014, ICSE.

[29]  Darko Marinov,et al.  An empirical analysis of flaky tests , 2014, SIGSOFT FSE.

[30]  Bowei Xi,et al.  A smart hill-climbing algorithm for application server configuration , 2004, WWW '04.

[31]  Weifeng Zhang,et al.  CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[32]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[33]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[34]  Mona Attariyan,et al.  Automating Configuration Troubleshooting with Dynamic Information Flow Analysis , 2010, OSDI.

[35]  Kyo Chul Kang,et al.  Feature-Oriented Domain Analysis (FODA) Feasibility Study , 1990 .

[36]  Xin Yan,et al.  Automatic Detection and Repair Recommendation of Directive Defects in Java API Documentation , 2020, IEEE Transactions on Software Engineering.

[37]  Tianyin Xu,et al.  EnCore: exploiting system environment and correlation information for misconfiguration detection , 2014, ASPLOS.

[38]  Shanshan Li,et al.  Understanding and discovering software configuration dependencies in cloud and datacenter systems , 2020, ESEC/SIGSOFT FSE.

[39]  Chi Li,et al.  Statically inferring performance properties of software configurations , 2020, EuroSys.

[40]  Eben M. Haber,et al.  Design guidelines for system administration tools developed through ethnographic field studies , 2007, CHIMIT '07.

[41]  Yuanyuan Zhou,et al.  Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.

[42]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[43]  Myra B. Cohen,et al.  Covering arrays for efficient fault characterization in complex configuration spaces , 2004, IEEE Transactions on Software Engineering.

[44]  Gary T. Leavens,et al.  @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[45]  Adam A. Porter,et al.  Using symbolic evaluation to understand behavior in configurable software systems , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[46]  Yuanyuan Zhou,et al.  /*icomment: bugs or bad comments?*/ , 2007, SOSP.

[47]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[48]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[49]  Tim Menzies,et al.  Whence to Learn? Transferring Knowledge in Configurable Systems using BEETLE , 2019, ArXiv.

[50]  Krzysztof Czarnecki,et al.  Where Do Configuration Constraints Stem From? An Extraction Approach and an Empirical Study , 2015, IEEE Transactions on Software Engineering.

[51]  Artur Andrzejak,et al.  Does the Choice of Configuration Framework Matter for Developers? Empirical Study on 11 Java Configuration Frameworks , 2017, 2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[52]  Tianyin Xu,et al.  Mining Container Image Repositories for Software Configuration and Beyond , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: New Ideas and Emerging Technologies Results (ICSE-NIER).

[53]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[54]  Robert W. Bowdidge,et al.  Why don't software developers use static analysis tools to find bugs? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[55]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[56]  Robert W. Reeder,et al.  Improving user-interface dependability through mitigation of human error , 2005, Int. J. Hum. Comput. Stud..

[57]  Parthasarathy Ranganathan,et al.  The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition , 2018, The Datacenter as a Computer.

[58]  Randy H. Katz,et al.  Precomputing possible configuration error diagnoses , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[59]  Na Meng,et al.  Inferring and Applying Def-Use Like Configuration Couplings in Deployment Descriptors , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60]  Donald A. Norman,et al.  Design rules based on analyses of human error , 1983, CACM.

[61]  Stuart Kendrick What Takes Us Down? , 2012, login Usenix Mag..

[62]  Ranjita Bhagwan,et al.  Rex: Preventing Bugs and Misconfiguration in Large Services Using Correlated Change Analysis , 2020, NSDI.

[63]  Donald A. Norman,et al.  Design principles for human-computer interfaces , 1983, CHI '83.

[64]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[65]  Sven Apel,et al.  Performance-influence models for highly configurable systems , 2015, ESEC/SIGSOFT FSE.

[66]  Tao Ye,et al.  A recursive random search algorithm for large-scale network parameter configuration , 2003, SIGMETRICS '03.

[67]  Tim Menzies,et al.  Scout: An Experienced Guide to Find the Best Cloud Configuration , 2018, ArXiv.

[68]  Emerson R. Murphy-Hill,et al.  Compiler error notifications revisited: an interaction-first approach for helping developers more effectively comprehend and resolve error notifications , 2014, ICSE Companion.

[69]  Randy H. Katz,et al.  How Hadoop Clusters Break , 2013, IEEE Software.

[70]  Saikat Dutta,et al.  Testing probabilistic programming systems , 2018, ESEC/SIGSOFT FSE.

[71]  Randy H. Katz,et al.  Static extraction of program configuration options , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[72]  Sven Apel,et al.  An analysis of the variability in forty preprocessor-based software product lines , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[73]  Richard P. Martin,et al.  Understanding and Dealing with Operator Mistakes in Internet Services , 2004, OSDI.

[74]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[75]  Bogdan Vasilescu,et al.  Exploring Differences and Commonalities between Feature Flags and Configuration Options , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[76]  Qiang Fu,et al.  Log2: A Cost-Aware Logging Mechanism for Performance Diagnosis , 2015, USENIX Annual Technical Conference.

[77]  Shankar Pasupathy,et al.  PracExtractor: Extracting Configuration Good Practices from Manuals to Detect Server Misconfigurations , 2020, USENIX Annual Technical Conference.

[78]  Myra B. Cohen,et al.  Incremental covering array failure characterization in large configuration spaces , 2009, ISSTA.

[79]  Eric Bodden,et al.  Tracking Load-Time Configuration Options , 2014, IEEE Transactions on Software Engineering.

[80]  Krzysztof Czarnecki,et al.  Range Fixes: Interactive Error Resolution for Software Configuration , 2015, IEEE Transactions on Software Engineering.

[81]  Ding Yuan,et al.  Improving Software Diagnosability via Log Enhancement , 2012, TOCS.

[82]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[83]  Sven Apel,et al.  Finding Faster Configurations Using FLASH , 2018, IEEE Transactions on Software Engineering.

[84]  Chris Parnin,et al.  V2: Fast Detection of Configuration Drift in Python , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[85]  Sven Apel,et al.  Using bad learners to find good configurations , 2017, ESEC/SIGSOFT FSE.

[86]  Krzysztof Czarnecki,et al.  Reverse engineering feature models , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[87]  Peng Huang,et al.  ConfValley: a systematic configuration validation framework for cloud services , 2015, EuroSys.

[88]  Hanspeter Mössenböck,et al.  Understanding GCC builtins to develop better tools , 2019, ESEC/SIGSOFT FSE.

[89]  Michael D. Ernst,et al.  Proactive detection of inadequate diagnostic messages for software configuration errors , 2015, ISSTA.

[90]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[91]  Gergö Barany,et al.  Challenges in Validating FLOSS Configuration , 2017, OSS.

[92]  Mona Attariyan,et al.  Using Causality to Diagnose Configuration Bugs , 2008, USENIX Annual Technical Conference.

[93]  Myra B. Cohen,et al.  PrefFinder: getting the right preference in configurable software systems , 2014, ASE.

[94]  Tianyin Xu,et al.  Systems Approaches to Tackling Configuration Errors , 2015, ACM Comput. Surv..

[95]  Yuanyuan Zhou,et al.  Towards Continuous Access Control Validation and Forensics , 2019, CCS.

[96]  Fabio Petrillo,et al.  Software Configuration Engineering in Practice Interviews, Survey, and Systematic Literature Review , 2020, IEEE Transactions on Software Engineering.

[97]  Bram Adams,et al.  On Cross-Stack Configuration Errors , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[98]  Myra B. Cohen,et al.  Configuration-aware regression testing: an empirical study of sampling and prioritization , 2008, ISSTA '08.

[99]  Krzysztof Czarnecki,et al.  The Variability Model of The Linux Kernel , 2010, VaMoS.

[100]  Medha Bhadkamkar,et al.  Getting Back Up: Understanding How Enterprise Data Backups Fail , 2016, USENIX Annual Technical Conference.

[101]  Xiao Ma,et al.  AutoISES: Automatically Inferring Security Specification and Detecting Violations , 2008, USENIX Security Symposium.

[102]  Robert Karl,et al.  Holistic configuration management at Facebook , 2015, SOSP.

[103]  Artur Andrzejak,et al.  Practical and accurate pinpointing of configuration errors using static analysis , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[104]  Wei Zheng,et al.  Automatic configuration of internet services , 2007, EuroSys '07.

[105]  Na Meng,et al.  Secure Coding Practices in Java: Challenges and Vulnerabilities , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[106]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[107]  Yu Luo,et al.  Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[108]  Myra B. Cohen,et al.  Configurations everywhere: implications for testing and debugging in practice , 2014, ICSE Companion.

[109]  Xuehai Qian,et al.  Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing , 2018, ASPLOS.

[110]  Qiang Fu,et al.  Where do developers log? an empirical study on logging practices in industry , 2014, ICSE Companion.

[111]  Runxiang Cheng,et al.  Testing Configuration Changes in Context to Prevent Production Failures , 2020, OSDI.

[112]  Krzysztof Czarnecki,et al.  Generating range fixes for software configuration , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[113]  Helen J. Wang,et al.  Strider: a black-box, state-based approach to change and configuration management and support , 2003, Sci. Comput. Program..

[114]  Krzysztof Czarnecki,et al.  A Study of Variability Models and Languages in the Systems Software Domain , 2013, IEEE Transactions on Software Engineering.

[115]  Eser Kandogan,et al.  Trust as an underlying factor of system administrator interface choice , 2006, CHI Extended Abstracts.

[116]  Scott R. Klemmer,et al.  An HCI View of Configuration Problems , 2016, ArXiv.