COMPASS: Online Sketch-based Query Optimization for In-Memory Databases

Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses for accurate cardinality estimation. As the complexity of selections and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through joins, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics---Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-AGMS sketches to be computed only over the relevant tuples---which enhances cardinality estimation accuracy. Plan enumeration is performed over the query join graph by incrementally composing attribute-level sketches---not by building a separate sketch for every sub-plan. We prototype COMPASS in MapD -- an open-source parallel database -- and perform extensive experiments over the complete JOB benchmark. The results prove that COMPASS generates better execution plans -- both in terms of cardinality and runtime -- compared to four other database systems. Overall, COMPASS achieves a speedup ranging from 1.35X to 11.28X in cumulative query execution time over the considered competitors.

[1]  Graham Cormode,et al.  Sketching Streams Through the Net: Distributed Approximate Query Tracking , 2005, VLDB.

[3]  Volker Markl,et al.  Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models , 2017, Proc. VLDB Endow..

[4]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[5]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[6]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[7]  Immanuel Trummer,et al.  SkinnerDB: Regret-Bounded Query Evaluation via Reinforcement Learning , 2018, Proc. VLDB Endow..

[8]  Riham Abdel Kader,et al.  ROX: run-time optimization of XQueries , 2009, SIGMOD Conference.

[9]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[10]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[11]  Nick Koudas,et al.  Multi-Attribute Selectivity Estimation Using Deep Learning , 2019, ArXiv.

[12]  Andreas Kipf,et al.  Estimating Cardinalities with Deep Sketches , 2019, SIGMOD Conference.

[13]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[14]  Wolfgang Lehner,et al.  Cardinality estimation with local deep learning models , 2019, aiDM@SIGMOD.

[15]  Guido Moerkotte,et al.  Improved Selectivity Estimation by Combining Knowledge from Sampling and Synopses , 2018, Proc. VLDB Endow..

[16]  Magdalena Balazinska,et al.  An Empirical Analysis of Deep Learning for Cardinality Estimation , 2019, ArXiv.

[17]  Olga Papaemmanouil,et al.  Deep Reinforcement Learning for Join Order Enumeration , 2018, aiDM@SIGMOD.

[18]  Nitesh V. Chawla,et al.  A Black-Box Approach to Query Cardinality Estimation , 2007, CIDR.

[19]  Cynthia Weber Neo- , 2002, International Relations Theory.

[20]  Wen-Chi Hou,et al.  CS2: a new database synopsis for query estimation , 2013, SIGMOD '13.

[21]  Tim Kraska,et al.  Neo: A Learned Query Optimizer , 2019, Proc. VLDB Endow..

[22]  G. Evans,et al.  Learning to Optimize , 2008 .

[23]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[24]  Florin Rusu,et al.  Online Sketch-based Query Optimization , 2021, ArXiv.

[25]  Andreas Kipf,et al.  Learned Cardinalities: Estimating Correlated Joins with Deep Learning , 2018, CIDR.

[26]  Volker Markl,et al.  LEO: An autonomic query optimizer for DB2 , 2003, IBM Syst. J..

[27]  Viktor Leis,et al.  Cardinality Estimation Done Right: Index-Based Join Sampling , 2017, CIDR.

[28]  Ion Stoica,et al.  Learning to Optimize Join Queries With Deep Reinforcement Learning , 2018, ArXiv.

[29]  David Vengerov,et al.  Join Size Estimation Subject to Filter Conditions , 2015, Proc. VLDB Endow..

[30]  Guy M. Lohman,et al.  Is query optimization a 'solved' problem? , 1989 .

[31]  Florin Rusu,et al.  Sketches for size of join estimation , 2008, TODS.

[32]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[33]  Calisto Zuzarte,et al.  Cardinality estimation using neural networks , 2015, CASCON.

[34]  Alex Suhan,et al.  Exact Selectivity Computation for Modern In-Memory Database Query Optimization , 2019, ArXiv.

[35]  Srikanth Kandula,et al.  Selectivity Estimation for Range Predicates using Lightweight Models , 2019, Proc. VLDB Endow..

[36]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[37]  Viktor Leis,et al.  Query optimization through the looking glass, and what we found running the Join Order Benchmark , 2017, The VLDB Journal.

[38]  Dan Suciu,et al.  Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities , 2019, SIGMOD Conference.

[39]  Volker Markl,et al.  Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation , 2015, SIGMOD Conference.

[40]  Rajeev Rastogi,et al.  Sketch-Based Multi-Query Processing over Data Streams , 2004, Data Stream Management.

[41]  Florin Rusu,et al.  Statistical analysis of sketch estimators , 2007, SIGMOD '07.

[42]  Florin Rusu,et al.  Sketching Sampled Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[43]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.