ConfInLog: Leveraging Software Logs to Infer Configuration Constraints

Misconfigurations have become the dominant causes of software failures in recent years, drawing tremendous attention for their increasing prevalence and severity. Configuration constraints can preemptively avoid misconfiguration by defining the conditions that configuration options should satisfy. Documentation is the main source of configuration constraints, but it might be incomplete or inconsistent with the source code. In this regard, prior researches have focused on obtaining configuration constraints from software source code through static analysis. However, the difficulty in pointer analysis and context comprehension prevents them from collecting accurate and comprehensive constraints. In this paper, we observed that software logs often contain configuration constraints. We conducted an empirical study and summarized patterns of configuration-related log messages. Guided by the study, we designed and implemented ConfInLog, a static tool to infer configuration constraints from log messages. ConfInLog first selects configuration-related log messages from source code by using the summarized patterns, then infers constraints from log messages based on the summarized natural language patterns. To evaluate the effectiveness of ConfInLog, we applied our tool on seven popular open-source software systems. ConfInLog successfully inferred 22∼163 constraints, in which 59.5%∼ 61.6% could not be inferred by the state-of-the-art work. Finally, we submitted 67 documentation patches regarding the constraints inferred by ConfInLog. The constraints in 29 patches have been confirmed by the developers, among which 10 patches have been accepted.

[1]  Lorenzo Keller,et al.  ConfErr: A tool for assessing resilience to human configuration errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[2]  Eran Yahav,et al.  Code completion with statistical language models , 2014, PLDI.

[3]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[4]  Junfeng Yang,et al.  Context-based Online Configuration-Error Detection , 2011, USENIX Annual Technical Conference.

[5]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[6]  Tao Xie,et al.  Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  Ruzica Piskac,et al.  Synthesizing configuration file specifications with association rule learning , 2017, Proc. ACM Program. Lang..

[8]  Haryadi S. Gunawi,et al.  Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.

[9]  Mona Attariyan,et al.  Using Causality to Diagnose Configuration Bugs , 2008, USENIX Annual Technical Conference.

[10]  Yuanyuan Zhou,et al.  Do not blame users for misconfigurations , 2013, SOSP.

[11]  Saurabh Bagchi,et al.  Characterizing configuration problems in Java EE application servers: An empirical study with GlassFish and JBoss , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[12]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[13]  Mona Attariyan,et al.  Automating Configuration Troubleshooting with Dynamic Information Flow Analysis , 2010, OSDI.

[14]  Xiaodong Liu,et al.  ConfMapper: Automated Variable Finding for Configuration Items in Source Code , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[15]  Li Wang,et al.  ConfTest: Generating Comprehensive Misconfiguration for System Reaction Ability Evaluation , 2017, EASE.

[16]  Shanshan Li,et al.  ConfVD: System Reactions Analysis and Evaluation Through Misconfiguration Injection , 2018, IEEE Transactions on Reliability.

[17]  Ding Yuan,et al.  SherLog: error diagnosis by connecting clues from run-time logs , 2010, ASPLOS XV.

[18]  Michael Pradel,et al.  Detecting argument selection defects , 2017, Proc. ACM Program. Lang..

[19]  Artur Andrzejak,et al.  Practical and accurate pinpointing of configuration errors using static analysis , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[20]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[21]  Leonardo Mariani,et al.  Automated Identification of Failure Causes in System Logs , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[22]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[23]  Tianyin Xu,et al.  EnCore: exploiting system environment and correlation information for misconfiguration detection , 2014, ASPLOS.

[24]  Shanshan Li,et al.  Understanding and discovering software configuration dependencies in cloud and datacenter systems , 2020, ESEC/SIGSOFT FSE.

[25]  Yue Luo,et al.  Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[26]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[27]  Stuart Kendrick What Takes Us Down? , 2012, login Usenix Mag..

[28]  Michael D. Ernst,et al.  Automated diagnosis of software configuration errors , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[29]  Andreas Krause,et al.  Predicting Program Properties from "Big Code" , 2015, POPL.

[30]  Randy H. Katz,et al.  How Hadoop Clusters Break , 2013, IEEE Software.

[31]  Randy H. Katz,et al.  Static extraction of program configuration options , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[32]  Barbara G. Ryder,et al.  Pointer-induced aliasing: a problem classification , 1991, POPL '91.

[33]  Yuanyuan Zhou,et al.  Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.

[34]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[35]  Peng Huang,et al.  ConfValley: a systematic configuration validation framework for cloud services , 2015, EuroSys.

[36]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[37]  Lujo Bauer,et al.  Detecting and resolving policy misconfigurations in access-control systems , 2008, SACMAT '08.

[38]  Yu Luo,et al.  lprof: A Non-intrusive Request Flow Profiler for Distributed Systems , 2014, OSDI.

[39]  Peng Huang,et al.  13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018 , 2018, OSDI.

[40]  Medha Bhadkamkar,et al.  Getting Back Up: Understanding How Enterprise Data Backups Fail , 2016, USENIX Annual Technical Conference.

[41]  Robert Karl,et al.  Holistic configuration management at Facebook , 2015, SOSP.

[42]  Yuanyuan Zhou,et al.  Towards Continuous Access Control Validation and Forensics , 2019, CCS.

[43]  Yu Luo,et al.  Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold , 2017, SOSP.

[44]  Shanshan Li,et al.  Do You Really Know How to Configure Your Software? Configuration Constraints in Source Code May Help , 2018, IEEE Transactions on Reliability.

[45]  Michael D. Ernst,et al.  Proactive detection of inadequate diagnostic messages for software configuration errors , 2015, ISSTA.

[46]  Xiaodong Liu,et al.  SMARTLOG: Place error log statement by deep understanding of log intention , 2018, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[47]  Parthasarathy Ranganathan,et al.  The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition , 2018, The Datacenter as a Computer.

[48]  Randy H. Katz,et al.  Precomputing possible configuration error diagnoses , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[49]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[50]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[51]  Shankar Pasupathy,et al.  PracExtractor: Extracting Configuration Good Practices from Manuals to Detect Server Misconfigurations , 2020, USENIX Annual Technical Conference.

[52]  Roy T. Fielding,et al.  The Apache HTTP Server Project , 1997, IEEE Internet Comput..