Database Workload Characterization with Query Plan Encoders

Smart databases are adopting artificial intelligence (AI) technologies to achieve instance optimality, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization. To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders captures the structural and the computational performance of queries independently. We show that our pretrained encoders are adaptable to workloads that expedites the transfer learning process. We performed independent assessments of structural encoder and performance encoders with multiple downstream tasks. For the overall evaluation of our query plan encoders, we architect two downstream tasks (i) query latency prediction and (ii) query classification. These tasks show the importance of feature-based workload characterization. We also performed extensive experiments on individual encoders to verify the effectiveness of representation learning, and domain adaptability. PVLDB Reference Format: Debjyoti Paul, Jie Cao, Feifei Li, Vivek Srikumar . Database Workload Characterization with Query Plan Encoders . PVLDB, (): xxxx-yyyy, . DOI: https://doi.org/TBD

[1]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[3]  Surajit Chaudhuri,et al.  Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques , 2012, Proc. VLDB Endow..

[4]  Suprio Ray,et al.  Jackpine: A benchmark to evaluate spatial database performance , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[6]  Anastasia Ailamaki,et al.  Same Queries, Different Data: Can We Predict Runtime Performance? , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[7]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[8]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[9]  Michael Stonebraker,et al.  The Implementation of Postgres , 1990, IEEE Trans. Knowl. Data Eng..

[10]  Shivnath Babu,et al.  iTuned: a tool for configuring and visualizing database parameters , 2010, SIGMOD Conference.

[11]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  Guoliang Li,et al.  QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning , 2019, Proc. VLDB Endow..

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[16]  Kevin Knight,et al.  Smatch: an Evaluation Metric for Semantic Feature Structures , 2013, ACL.

[17]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[20]  Surajit Chaudhuri,et al.  AI Meets AI: Leveraging Query Executions to Improve Index Recommendations , 2019, SIGMOD Conference.

[21]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[22]  Feifei Li,et al.  iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases , 2019, Proc. VLDB Endow..

[23]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[24]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[25]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[26]  Olga Papaemmanouil,et al.  Plan-Structured Deep Neural Network Models for Query Performance Prediction , 2019, Proc. VLDB Endow..

[27]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[28]  Surajit Chaudhuri,et al.  Plan Stitch: Harnessing the Best of Many Plans , 2018, Proc. VLDB Endow..

[29]  B.L.P. Baas,et al.  NoSQL spatial: Neo4j versus PostGIS , 2012 .