Query optimization

Given a query, there are many access plans that a database management system (DBMS) can follow to process it and produce its answer. All plans are equivalent in terms of their final output but vary in their cost, that is, the amount of time that they need to run. This cost difference can be several orders of magnitude large. Thus all DBMSs have a module that examines “all” alternatives and chooses the plan that needs the least amount of time. This module is called the query optimizer. Query optimization is a large area within the database field and has been surveyed extensively [Jarke and Koch 1984; Mannino et al. 1988]. This short paper emphasizes optimization of a single select-project-join query in a centralized relational DBMS. An abstraction of the query optimization process, divided into a rewriting and a planning stage, is shown in Figure 1. The functionality of each module in Figure 1 is described in the following.

[1]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[2]  Robert Kooi,et al.  The Optimization of Queries in Relational Databases , 1980 .

[3]  Jonathan J. King QUIST: A System for Semantic Query Optimization in Relational Databases , 1981, VLDB.

[4]  Eugene Wong,et al.  Query processing in a system for distributed databases (SDD-1) , 1981, TODS.

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  Stavros Christodoulakis,et al.  Implications of certain assumptions in database performance evauation , 1984, TODS.

[7]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[8]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[9]  Toshihide Ibaraki,et al.  On the optimal nesting order for computing N-relational joins , 1984, TODS.

[10]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[11]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[12]  Sartaj Sahni,et al.  Simulated Annealing and Combinatorial Optimization , 1986, 23rd ACM/IEEE Design Automation Conference.

[13]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[14]  David J. DeWitt,et al.  The EXODUS optimizer generator , 1987, SIGMOD '87.

[15]  Eugene Wong,et al.  Query optimization by simulated annealing , 1987, SIGMOD '87.

[16]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[17]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[18]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[19]  David J. DeWitt,et al.  Equi-depth multidimensional histograms , 1988, SIGMOD '88.

[20]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[21]  Karen Ward,et al.  Dynamic query evaluation plans , 1989, SIGMOD '89.

[22]  Stéphane Lafortune,et al.  An Intelligent Search Method for Query Optimization by Semijoins , 1989, IEEE Trans. Knowl. Data Eng..

[23]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[24]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[25]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[26]  Jeffrey F. Naughton,et al.  Practical selectivity estimation through adaptive sampling , 1990, SIGMOD '90.

[27]  Guy M. Lohman,et al.  Measuring the Complexity of Join Enumeration in Query Optimization , 1990, VLDB.

[28]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[29]  Younkyung Cha Kang Randomized Algorithms for Query Optimization , 1991 .

[30]  Michael C. Ferris,et al.  A Genetic Algorithm for Database Query Optimization , 1991, ICGA.

[31]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[32]  Peter J. Haas,et al.  Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.

[33]  G. Antoshenkov,et al.  Dynamic query optimization in Rdb/VMS , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[34]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[35]  Patrick Valduriez,et al.  On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces , 1993, VLDB.

[36]  Kian-Lee Tan,et al.  Multi-Join Optimization for Symmetric Multiprocessors , 1993, VLDB.

[37]  Stavros Christodoulakis,et al.  Optimal histograms for limiting worst-case error propagation in the size of join results , 1993, TODS.

[38]  Balakrishna R. Iyer,et al.  A polynomial time algorithm for optimizing join queries , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[39]  Martin L. Kersten,et al.  Fast, Randomized Join-Order Selection - Why Use Transformations? , 1994, VLDB.

[40]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[41]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[42]  P.J. Haas,et al.  Sampling-based selectivity estimation for joins using augmented frequent value statistics , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[43]  Guy M. Lman Grammar-like Functional Rules for Representing Query Optimization Alternatives , 1998 .