Cardinality estimation with local deep learning models

Cardinality estimation is a fundamental task in database query processing and optimization. Unfortunately, the accuracy of traditional estimation techniques is poor resulting in non-optimal query execution plans. With the recent expansion of machine learning into the field of data management, there is the general notion that data analysis, especially neural networks, can lead to better estimation accuracy. Up to now, all proposed neural network approaches for the cardinality estimation follow a global approach considering the whole database schema at once. These global models are prone to sparse data at training leading to misestimates for queries which were not represented in the sample space used for generating training queries. To overcome this issue, we introduce a novel local-oriented approach in this paper, therefore the local context is a specific sub-part of the schema. As we will show, this leads to better representation of data correlation and thus better estimation accuracy. Compared to global approaches, our novel approach achieves an improvement by two orders of magnitude in accuracy and by a factor of four in training time performance for local models.

[1]  Surajit Chaudhuri,et al.  Exploiting statistics on query expressions for optimization , 2002, SIGMOD '02.

[2]  Guido Moerkotte,et al.  Preventing Bad Plans by Bounding the Impact of Cardinality Estimation Errors , 2009, Proc. VLDB Endow..

[3]  Calisto Zuzarte,et al.  Cardinality estimation using neural networks , 2015, CASCON.

[4]  Emilio Soria Olivas,et al.  Handbook of Research on Machine Learning Applications and Trends : Algorithms , Methods , and Techniques , 2009 .

[5]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[6]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[7]  Olga Papaemmanouil,et al.  Deep Reinforcement Learning for Join Order Enumeration , 2018, aiDM@SIGMOD.

[8]  Volker Markl,et al.  The Operator Variant Selection Problem on Heterogeneous Hardware , 2015, ADMS@VLDB.

[9]  Wolfgang Lehner,et al.  Local vs. Global Optimization: Operator Placement Strategies in Heterogeneous Environments , 2015, EDBT/ICDT Workshops.

[10]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[13]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[14]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[15]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[16]  Stavros Christodoulakis,et al.  Optimal histograms for limiting worst-case error propagation in the size of join results , 1993, TODS.

[17]  Magdalena Balazinska,et al.  Learning State Representations for Query Optimization with Deep Reinforcement Learning , 2018, DEEM@SIGMOD.

[18]  Sushil Jajodia,et al.  A note on estimating the cardinality of the projection of a database relation , 1991, TODS.

[19]  Andreas Kipf,et al.  Learned Cardinalities: Estimating Correlated Joins with Deep Learning , 2018, CIDR.

[20]  A. Santhakumaran,et al.  Statistical Normalization and Back Propagationfor Classification , 2011 .

[21]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[22]  S. Mohamed,et al.  Statistical Normalization and Back Propagation for Classification , 2022 .

[23]  Olga Papaemmanouil,et al.  Flexible Operator Embeddings via Deep Learning , 2019, ArXiv.

[24]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[25]  Guido Moerkotte,et al.  A new, highly efficient, and easy to implement top-down join enumeration algorithm , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[26]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[27]  Wolfgang Lehner,et al.  Adaptive Work Placement for Query Processing on Heterogeneous Computing Resources , 2017, Proc. VLDB Endow..

[28]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[29]  M. Seetha Lakshmi,et al.  Selectivity Estimation in Extensible Databases - A Neural Network Approach , 1998, VLDB.