openGauss: An Autonomous Database System

Although learning-based database optimization techniques have been studied from academia in recent years, they have not been widely deployed in commercial database systems. In this work, we build an autonomous database framework and integrate our proposed learning-based database techniques into an open-source database system openGauss. We propose effective learning-based models to build learned optimizers (including learned query rewrite, learned cost/cardinality estimation, learned join order selection and physical operator selection) and learned database advisors (including self-monitoring, self-diagnosis, self-configuration, and selfoptimization). We devise an effective validation model to validate the effectiveness of learned models. We build effective training data management and model management platforms to easily deploy learned models. We have evaluated our techniques on real-world datasets and the experimental results validated the effectiveness of our techniques. We also provide our learnings of deploying learning-based techniques. PVLDB Reference Format: Guoliang Li, Xuanhe Zhou, Ji Sun, Xiang Yu, Yue Han, Lianyuan Jin, Wenbo Li, Tianqing Wang, Shifu Li. openGauss: An Autonomous Database System. PVLDB, 14(12): 3028 3041, 2021. doi:10.14778/3476311.3476380 PVLDB Artifact Availability: https://gitee.com/opengauss/openGauss-AI.

[1]  Shivnath Babu,et al.  iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks , 2019, SIGMOD Conference.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[4]  Graham Wood,et al.  Automatic Performance Diagnosis and Tuning in Oracle , 2005, CIDR.

[5]  Satyanarayana R. Valluri,et al.  Query Optimization in Oracle 12c Database In-Memory , 2015, Proc. VLDB Endow..

[6]  Immanuel Trummer,et al.  SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning , 2018, Proc. VLDB Endow..

[7]  Stephen J. Roberts,et al.  Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Z. Bao,et al.  An Index Advisor Using Deep Reinforcement Learning , 2020, CIKM.

[9]  Barzan Mozafari,et al.  DBSherlock: A Performance Diagnostic Tool for Transactional Databases , 2016, SIGMOD Conference.

[10]  Eva Kwan,et al.  Automatic Configuration for IBM ® DB2 Universal , 2002 .

[11]  Michael Reichert,et al.  Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning , 2008, CASCON '08.

[12]  Guoliang Li,et al.  Automatic View Generation with Deep Learning and Reinforcement Learning , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[13]  Feifei Li,et al.  Cloud native database systems at Alibaba: Opportunities and Challenges , 2019, Proc. VLDB Endow..

[14]  Carsten Binnig,et al.  DeepDB , 2019, Proc. VLDB Endow..

[15]  Florian Waas,et al.  Join Order Selection - Good Enough Is Easy , 2000, BNCOD.

[16]  Surajit Chaudhuri,et al.  AutoAdmin “what-if” index analysis utility , 1998, SIGMOD '98.

[17]  Béatrice Finance,et al.  A rule-based query rewriter in an extensible DBMS , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[18]  Milo Tomasevic,et al.  Automatic Database Troubleshooting of Azure SQL Databases , 2022, IEEE Transactions on Cloud Computing.

[19]  Le Gruenwald,et al.  Online Index Selection Using Deep Reinforcement Learning for a Cluster Database , 2020, 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW).

[20]  Guoliang Li,et al.  An End-to-End Learning-based Cost Estimator , 2019, Proc. VLDB Endow..

[21]  Aameek Singh,et al.  Why Did My Query Slow Down , 2009, CIDR.

[22]  Stefan Halfpap,et al.  Magic mirror in my hand, which is the best in the land? , 2020, Proc. VLDB Endow..

[23]  Andreas Kipf,et al.  Learned Cardinalities: Estimating Correlated Joins with Deep Learning , 2018, CIDR.

[24]  Sanjay Krishnan,et al.  Opportunistic View Materialization with Deep Reinforcement Learning , 2019, ArXiv.

[25]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[26]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[27]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[28]  Xi Chen,et al.  Deep Unsupervised Cardinality Estimation , 2019, Proc. VLDB Endow..

[29]  Xingquan Zhu,et al.  Deep Learning for User Interest and Response Prediction in Online Display Advertising , 2020, Data Science and Engineering.

[30]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[31]  Peter J. Haas,et al.  Statistical Learning Techniques for Costing XML Queries , 2005, VLDB.

[32]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[33]  Barzan Mozafari,et al.  QuickSel: Quick Selectivity Learning with Mixture Models , 2018, SIGMOD Conference.

[34]  Jeffrey F. Naughton,et al.  Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads , 2013, Proc. VLDB Endow..

[35]  Feifei Li,et al.  iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases , 2019, Proc. VLDB Endow..

[36]  Daniel Lemire,et al.  Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources , 2018, SIGMOD Conference.

[37]  Xuanhe Zhou,et al.  Machine Learning for Databases , 2021, Proc. VLDB Endow..

[38]  Feng Li,et al.  Automated Demand-driven Resource Scaling in Relational Database-as-a-Service , 2016, SIGMOD Conference.

[39]  Ke Zhou,et al.  An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning , 2019, SIGMOD Conference.

[40]  Praveen Kumar,et al.  Automated generation of materialized views in Oracle , 2020, Proc. VLDB Endow..

[41]  Eli Upfal,et al.  Performance prediction for concurrent database workloads , 2011, SIGMOD '11.

[42]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[43]  Liwei Wang,et al.  Deep Reinforcement Learning-Based Approach to Tackle Topic-Aware Influence Maximization , 2020, Data Science and Engineering.

[44]  Shenglin Zhang,et al.  Diagnosing Root Causes of Intermittent Slow Queries in Large-Scale Cloud Databases. , 2020, VLDB 2020.

[45]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[46]  Olga Papaemmanouil,et al.  Plan-Structured Deep Neural Network Models for Query Performance Prediction , 2019, Proc. VLDB Endow..

[47]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[48]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[49]  Hongzhi Wang,et al.  Mining conditional functional dependency rules on big data , 2020, Big Data Min. Anal..

[50]  Ion Stoica,et al.  Learning to Optimize Join Queries With Deep Reinforcement Learning , 2018, ArXiv.

[51]  Lei Shi,et al.  MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks , 2019, ICANN.

[52]  Olga Papaemmanouil,et al.  Deep Reinforcement Learning for Join Order Enumeration , 2018, aiDM@SIGMOD.

[53]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[54]  Tim Kraska,et al.  Neo: A Learned Query Optimizer , 2019, Proc. VLDB Endow..

[55]  Guoliang Li,et al.  AI Meets Database: AI4DB and DB4AI , 2021, SIGMOD Conference.

[56]  Darcy G. Benoit,et al.  Automatic Diagnosis of Performance Problems in Database Management Systems , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[57]  Hiren Patel,et al.  Selecting Subexpressions to Materialize at Datacenter Scale , 2018, Proc. VLDB Endow..

[58]  Xuanhe Zhou,et al.  DBMind: A Self-Driving Platform in openGauss , 2021, Proc. VLDB Endow..

[59]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[60]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[61]  Ming Gao,et al.  BiNE: Bipartite Network Embedding , 2018, SIGIR.

[62]  Guoliang Li,et al.  XuanYuan: An AI-Native Database , 2019, IEEE Data Eng. Bull..

[63]  Guoliang Li,et al.  Reinforcement Learning with Tree-LSTM for Join Order Selection , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[64]  Xi Chen,et al.  NeuroCard , 2020, Proc. VLDB Endow..

[65]  Jens Dittrich,et al.  The Case for Automatic Database Administration using Deep Reinforcement Learning , 2018, ArXiv.

[66]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[67]  Guoliang Li,et al.  QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning , 2019, Proc. VLDB Endow..

[68]  Jianhua Feng,et al.  Query performance prediction for concurrent queries using graph embedding , 2020, Proc. VLDB Endow..

[69]  Chengliang Chai,et al.  Database Meets Artificial Intelligence: A Survey , 2020, IEEE Transactions on Knowledge and Data Engineering.