Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs

To achieve high performance, recent research has shown that it is important to automatically tune the configuration knobs present in database systems. However, as database systems usually have 100s of knobs, auto-tuning frameworks spend a significant amount of time exploring the large configuration space and need to repeat this as workloads change. Given this challenge, we ask a more fundamental question of how many knobs do we need to tune in order to achieve good performance. Surprisingly, we find that with YCSB workload-A on Cassandra, tuning just five knobs can achieve 99% of the performance achieved by the best configuration that is obtained by tuning many knobs. We also show that our results hold across workloads and applies to other systems like PostgreSQL, motivating the need for tools that can automatically filter out the knobs that need to be tuned. Based on our results, we propose an initial design for accelerating auto-tuners and detail some future research directions.

[1]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[2]  C. Sutton Classification and Regression Trees, Bagging, and Boosting , 2005 .

[3]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[4]  Ke Zhou,et al.  An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning , 2019, SIGMOD Conference.

[5]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[6]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[7]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[8]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[9]  Zhen Cao,et al.  Carver: Finding Important Parameters for Storage System Tuning , 2020, FAST.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Sam Lightstone,et al.  Adaptive self-tuning memory in DB2 , 2006, VLDB.

[12]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[13]  S. Glantz Primer of applied regression and analysis of variance / Stanton A. Glantz, Bryan K. Slinker , 1990 .

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  Petra Perner,et al.  Machine Learning and Data Mining in Pattern Recognition , 2009, Lecture Notes in Computer Science.

[16]  Graham Wood,et al.  Automatic Performance Diagnosis and Tuning in Oracle , 2005, CIDR.

[17]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[18]  Zhen Cao,et al.  Towards Better Understanding of Black-box Auto-Tuning: A Comparative Analysis for Storage Systems , 2018, USENIX Annual Technical Conference.

[19]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[20]  Liu Chu,et al.  Reliability Based Optimization with Metaheuristic Algorithms and Latin Hypercube Sampling Based Surrogate Models , 2015 .

[21]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[22]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.