Exact Cardinality Query Optimization for Optimizer Testing

The accuracy of cardinality estimates is crucial for obtaining a good query execution plan. Today's optimizers make several simplifying assumptions during cardinality estimation that can lead to large errors and hence poor plans. In a scenario such as query optimizer testing it is very desirable to obtain the "best" plan, i.e., the plan produced when the cardinality of each relevant expression is exact. Such a plan serves as a baseline against which plans produced by using the existing cardinality estimation module in the query optimizer can be compared. However, obtaining all exact cardinalities by executing appropriate subexpressions can be prohibitively expensive. In this paper, we present a set of techniques that makes exact cardinality query optimization a viable option for a significantly larger set of queries than previously possible. We have implemented this functionality in Microsoft SQL Server and we present results using the TPC-H benchmark queries that demonstrate their effectiveness.

[1]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[2]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[3]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[4]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[5]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[6]  S. Sudarshan,et al.  Pipelining in multi-query optimization , 2001, PODS '01.

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[9]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[10]  Peter J. Haas,et al.  Automated Statistics Collection in DB2 UDB , 2004, VLDB.

[11]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[12]  Surajit Chaudhuri,et al.  Database Tuning Advisor for Microsoft SQL Server 2005 , 2004, VLDB.

[13]  Surajit Chaudhuri,et al.  Automating Statistics Management for Query Optimizers , 2001, IEEE Trans. Knowl. Data Eng..

[14]  Yannis Sismanis,et al.  Shared Index Scans for Data Warehouses , 2001, DaWaK.

[15]  Volker Markl,et al.  Collecting and Maintaining Just-in-Time Statistics , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[17]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[18]  Goetz Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[19]  J. S. Saini,et al.  Adaptive Query Processing , 2006 .

[20]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[21]  Surajit Chaudhuri,et al.  Exploiting statistics on query expressions for optimization , 2002, SIGMOD '02.

[22]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[23]  KabraNavin,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998 .