Synthesizing configuration file specifications with association rule learning

System failures resulting from configuration errors are one of the major reasons for the compromised reliability of today's software systems. Although many techniques have been proposed for configuration error detection, these approaches can generally only be applied after an error has occurred. Proactively verifying configuration files is a challenging problem, because 1) software configurations are typically written in poorly structured and untyped “languages”, and 2) specifying rules for configuration verification is challenging in practice. This paper presents ConfigV, a verification framework for general software configurations. Our framework works as follows: in the pre-processing stage, we first automatically derive a specification. Once we have a specification, we check if a given configuration file adheres to that specification. The process of learning a specification works through three steps. First, ConfigV parses a training set of configuration files (not necessarily all correct) into a well-structured and probabilistically-typed intermediate representation. Second, based on the association rule learning algorithm, ConfigV learns rules from these intermediate representations. These rules establish relationships between the keywords appearing in the files. Finally, ConfigV employs rule graph analysis to refine the resulting rules. ConfigV is capable of detecting various configuration errors, including ordering errors, integer correlation errors, type errors, and missing entry errors. We evaluated ConfigV by verifying public configuration files on GitHub, and we show that ConfigV can detect known configuration errors in these files.

[1]  K. Rustan M. Leino,et al.  Dafny: An Automatic Program Verifier for Functional Correctness , 2010, LPAR.

[2]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[3]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[4]  Tianyin Xu,et al.  EnCore: exploiting system environment and correlation information for misconfiguration detection , 2014, ASPLOS.

[5]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[6]  Ruzica Piskac,et al.  Probabilistic Automated Language Learning for Configuration Files , 2016, CAV.

[7]  François Bobot,et al.  Let’s verify this with Why3 , 2014, International Journal on Software Tools for Technology Transfer.

[8]  Mona Attariyan,et al.  Automating Configuration Troubleshooting with Dynamic Information Flow Analysis , 2010, OSDI.

[9]  Yuanyuan Zhou,et al.  Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.

[10]  Anna Philippou,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2018, Lecture Notes in Computer Science.

[11]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[12]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[13]  Mona Attariyan,et al.  AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[14]  Peng Huang,et al.  ConfValley: a systematic configuration validation framework for cloud services , 2015, EuroSys.

[15]  Andreas Zeller Chapter 6 – Scientific Debugging , 2006 .

[16]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[17]  Andreas Zeller,et al.  Why Programs Fail: A Guide to Systematic Debugging , 2005 .

[18]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[19]  Ruzica Piskac,et al.  GRASShopper - Complete Heap Verification with Mixed Specifications , 2014, TACAS.

[20]  Xu Chen,et al.  Declarative configuration management for complex and dynamic networks , 2010, CoNEXT.

[21]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[22]  Tianyin Xu,et al.  Systems Approaches to Tackling Configuration Errors , 2015, ACM Comput. Surv..

[23]  Albert G. Greenberg,et al.  Configuration management at massive scale: system design and experience , 2007, IEEE Journal on Selected Areas in Communications.

[24]  Junfeng Yang,et al.  Context-based Online Configuration-Error Detection , 2011, USENIX Annual Technical Conference.

[25]  Ion Stoica,et al.  Declarative routing: extensible routing with declarative queries , 2005, SIGCOMM '05.

[26]  Yuanyuan Zhou,et al.  Do not blame users for misconfigurations , 2013, SOSP.

[27]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[28]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[29]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[30]  Steven D. Gribble,et al.  Configuration Debugging as Search: Finding the Needle in the Haystack , 2004, OSDI.