CSMiner: An Automated Tool for Analyzing Changes in Configuration Settings across Multiple Versions of Large Scale Cloud Software

As software evolves, the number of configuration settings and their usage scenario often change as well, causing system misconfiguration and performance degradation. However, no tool exists today that can aid system administrators/developers answering questions such as "What are the new configuration settings in this new version?", or "Where and How is setting X used in the new version?". As manually investigating answers to these questions is almost impossible due to the number of settings and size of the software, this paper investigates the design of an automated tool (CSMiner) leveraging static program analysis techniques that helps users to understand how and where a particular setting is used in a program and how settings have evolved across different versions of a software system. CSMiner was applied on four different open source software packages, namely, Apache Cassandra, ElasticSearch, Apache Hadoop, and Apache HBase, and CSMiner identified 109 (out of 109), 109 (out of 113), 811 (out of 847), and 160 (out of 167) settings for these software packages respectively. In each case, CSMiner successfully identified the changes in configuration settings across multiple versions with high accuracy.

[1]  Myra B. Cohen,et al.  PrefFinder: getting the right preference in configurable software systems , 2014, ASE.

[2]  Janice Singer,et al.  How software engineers use documentation: the state of the practice , 2003, IEEE Software.

[3]  Randy H. Katz,et al.  Precomputing possible configuration error diagnoses , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[4]  Nhan Nguyen,et al.  Performance analysis of a fault-tolerant exact motif mining algorithm on the cloud , 2013, 2013 IEEE 32nd International Performance Computing and Communications Conference (IPCCC).

[5]  Janice Singer Practices of software maintenance , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[6]  Eric Bodden,et al.  Inter-procedural data-flow analysis with IFDS/IDE and Soot , 2012, SOAP '12.

[7]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[8]  Randy H. Katz,et al.  Static extraction of program configuration options , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[9]  Michael D. Ernst,et al.  Automated diagnosis of software configuration errors , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[10]  Eric Bodden,et al.  Tracking Load-Time Configuration Options , 2014, IEEE Transactions on Software Engineering.

[11]  Krzysztof Czarnecki,et al.  Generating range fixes for software configuration , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[12]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[13]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[14]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[15]  Steven D. Gribble,et al.  Configuration Debugging as Search: Finding the Needle in the Haystack , 2004, OSDI.

[16]  Junfeng Yang,et al.  Context-based Online Configuration-Error Detection , 2011, USENIX Annual Technical Conference.

[17]  Michael D. Ernst,et al.  Which configuration option should I change? , 2014, ICSE.

[18]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.