Smooth Scan: Statistics-oblivious access paths

Query optimizers depend heavily on statistics representing column distributions to create efficient query plans. In many cases, though, statistics are outdated or non-existent, and the process of refreshing statistics is very expensive, especially for ad-hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The main problem is that any decision, once made by the optimizer, is fixed throughout the execution of a query. In particular, each logical operator translates into a fixed choice of a physical operator at run-time. In this paper, we advocate for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the statistical properties of the data. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of traditional index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no access path decisions up front nor does it need accurate statistics to provide good performance. We implement Smooth Scan in PostgreSQL and, using both synthetic benchmarks as well as TPC-H, we show that it achieves robust performance while at the same time being statistics-oblivious.

[1]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[2]  Stavros Christodoulakis,et al.  Implications of certain assumptions in database performance evauation , 1984, TODS.

[3]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[4]  Volker Markl,et al.  LEO - DB2's LEarning Optimizer , 2001, VLDB.

[5]  Surajit Chaudhuri,et al.  Towards a robust query optimizer: a principled and practical approach , 2005, SIGMOD '05.

[6]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[7]  Harumi A. Kuno,et al.  Robust Query Processing (Dagstuhl Seminar 12321) , 2012, Dagstuhl Reports.

[8]  Jayant R. Haritsa,et al.  Plan bouquets: query processing without selectivity estimation , 2014, SIGMOD Conference.

[9]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[10]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[11]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[12]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, Distributed and Parallel Databases.

[13]  David J. DeWitt,et al.  Proactive re-optimization , 2005, SIGMOD '05.

[14]  Renata Borovica-Gajic,et al.  Smooth Scan: Robust Query Execution with a Statistics-oblivious Access Operator , 2014 .

[15]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[16]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[17]  Jayant R. Haritsa,et al.  Identifying robust plans through plan diagram reduction , 2008, Proc. VLDB Endow..

[18]  Goetz Graefe,et al.  New algorithms for join and grouping operations , 2012, Computer Science - Research and Development.

[19]  Torsten Grabs,et al.  Execution strategies for SQL subqueries , 2007, SIGMOD '07.

[20]  Surajit Chaudhuri,et al.  A pay-as-you-go framework for query execution feedback , 2008, Proc. VLDB Endow..

[21]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[22]  Alon Y. Halevy,et al.  Adapting to source properties in processing data integration queries , 2004, SIGMOD '04.

[23]  Jayant R. Haritsa,et al.  On the Production of Anorexic Plan Diagrams , 2007, VLDB.

[24]  Volker Markl,et al.  10381 Summary and Abstracts Collection - Robust Query Processing , 2010, Robust Query Processing.

[25]  Karen Ward,et al.  Dynamic query evaluation plans , 1989, SIGMOD '89.

[26]  Surajit Chaudhuri,et al.  Exact Cardinality Query Optimization for Optimizer Testing , 2009, Proc. VLDB Endow..

[27]  Zhi Cai,et al.  Size-l Object Summaries for Relational Keyword Search , 2011, Proc. VLDB Endow..

[28]  Harumi A. Kuno,et al.  Visualizing the robustness of query execution , 2009, CIDR.

[29]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[30]  Harumi A. Kuno,et al.  Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores , 2011, Proc. VLDB Endow..

[31]  Anastasia Ailamaki,et al.  Adaptive Query Processing on RAW Data , 2014, Proc. VLDB Endow..

[32]  Harumi A. Kuno,et al.  Modern B-tree techniques , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[33]  David J. DeWitt,et al.  Nested loops revisited , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[34]  Harumi A. Kuno,et al.  Adaptive indexing for relational keys , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[35]  Anastasia Ailamaki,et al.  Automated physical designers: what you see is (not) what you get , 2012, DBTest '12.

[36]  Thomas Neumann,et al.  TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.