ConfSeer: Leveraging Customer Support Knowledge Bases for Automated Misconfiguration Detection

We introduce ConfSeer, an automated system that detects potential configuration issues or deviations from identified best practices by leveraging a knowledge base (KB) of technical solutions. The intuition is that these KB articles describe the configuration problems and their fixes so if the system can accurately understand them, it can automatically pinpoint both the errors and their resolution. Unfortunately, finding an accurate match is difficult because (a) the KB articles are written in natural language text, and (b) configuration files typically contain a large number of parameters with a high value range. Thus, expert-driven manual troubleshooting is not scalable. While there are several state-of-the-art techniques proposed for individual tasks such as keyword matching, concept determination and entity resolution, none offer a practical end-to-end solution to detect problems in machine configurations. In this paper, we describe our experiences building ConfSeer using a novel combinations of ideas from natural language processing, information retrieval and interactive learning. ConfSeer powers the recommendation engine behind Microsoft Operations Management Suite that proposes fixes for software configuration errors. The system has been running in production for about a year to proactively find misconfigurations on tens of thousands of servers. Our evaluation of ConfSeer against an expert-defined rule-based commercial system, an expert survey and web search engines shows that it achieves 80%-97.5% accuracy and incurs low runtime overheads.

[1]  Michael D. Ernst,et al.  Automated diagnosis of software configuration errors , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Nick Feamster,et al.  Detecting BGP configuration faults with static analysis , 2005 .

[3]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Tianyin Xu,et al.  EnCore: exploiting system environment and correlation information for misconfiguration detection , 2014, ASPLOS.

[6]  Wei-Ying Ma,et al.  Automated known problem diagnosis with event traces , 2006, EuroSys.

[7]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[8]  Junfeng Yang,et al.  Context-based Online Configuration-Error Detection , 2011, USENIX Annual Technical Conference.

[9]  Lorenzo Keller,et al.  ConfErr: A tool for assessing resilience to human configuration errors , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[10]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[11]  Krzysztof Czarnecki,et al.  Generating range fixes for software configuration , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[13]  Claudio Giuliano,et al.  Exploiting Shallow Linguistic Information for Relation Extraction from Biomedical Literature , 2006, EACL.

[14]  Mona Attariyan,et al.  Automating Configuration Troubleshooting with Dynamic Information Flow Analysis , 2010, OSDI.

[15]  H. Ahmed,et al.  Dynamic entity and relationship extraction from news articles , 2012, 2012 International Conference on Emerging Technologies.

[16]  Jing Yuan,et al.  Generic and automatic address configuration for data center networks , 2010, SIGCOMM '10.

[17]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[18]  Kenneth Ward Church,et al.  Inverse Document Frequency (IDF): A Measure of Deviations from Poisson , 1995, VLC@ACL.

[19]  SmadjaFrank Retrieving collocations from text , 1993 .

[20]  Qiang Fu,et al.  Mining Invariants from Console Logs for System Problem Detection , 2010, USENIX Annual Technical Conference.

[21]  Helen J. Wang,et al.  Strider: a black-box, state-based approach to change and configuration management and support , 2003, Sci. Comput. Program..

[22]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[23]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[24]  Jian Li,et al.  Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases , 2013, Proc. VLDB Endow..

[25]  Yuanyuan Zhou,et al.  Do not blame users for misconfigurations , 2013, SOSP.

[26]  J. Kiefer,et al.  Sequential minimax search for a maximum , 1953 .

[27]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[28]  ChengXiang Zhai,et al.  Supporting Keyword Search in Product Database: A Probabilistic Approach , 2013, Proc. VLDB Endow..

[29]  Partha Pratim Talukdar,et al.  Active learning in keyword search-based data integration , 2014, The VLDB Journal.

[30]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[31]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[32]  Hotham Altwaijry,et al.  Query-Driven Approach to Entity Resolution , 2013, Proc. VLDB Endow..

[33]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[34]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[35]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[36]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[37]  Geoffrey M. Voelker,et al.  NetPrints: Diagnosing Home Network Misconfigurations Using Shared Knowledge , 2009, NSDI.

[38]  Steven D. Gribble,et al.  Configuration Debugging as Search: Finding the Needle in the Haystack , 2004, OSDI.

[39]  Sasu Tarkoma,et al.  Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search , 2013, Proc. VLDB Endow..

[40]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[41]  Navendu Jain,et al.  Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets , 2013, NSDI.

[42]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[43]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[44]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[45]  Mona Attariyan,et al.  AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[46]  Ling Huang,et al.  Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.

[47]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[48]  Kevin Chen-Chuan Chang,et al.  Searching patterns for relation extraction over the web: rediscovering the pattern-relation duality , 2011, WSDM '11.

[49]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[50]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[51]  Soudip Roy Chowdhury,et al.  Determining configuration parameter dependencies via analysis of configuration data from multi-tiered enterprise applications , 2009, ICAC '09.

[52]  Gail E. Kaiser,et al.  Configuration Fuzzing for Software Vulnerability Detection , 2010, 2010 International Conference on Availability, Reliability and Security.

[53]  Patrice Godefroid,et al.  Billions and billions of constraints: Whitebox fuzz testing in production , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[54]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[55]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[56]  Xu Chen,et al.  Declarative configuration management for complex and dynamic networks , 2010, CoNEXT.

[57]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[58]  Прикладное программное обеспечение Windows Error Reporting , 2012 .

[59]  Randy H. Katz,et al.  Precomputing possible configuration error diagnoses , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[60]  Kilian Q. Weinberger,et al.  Feature hashing for large scale multitask learning , 2009, ICML '09.

[61]  Luis Gravano,et al.  When Speed Has a Price: Fast Information Extraction Using Approximate Algorithms , 2013, Proc. VLDB Endow..