Building Query Optimizers with Combinators

Query optimizers generate plans to retrieve data requested by queries. Optimizers are hard to build because for any given query, there can be a prohibitively large number of plans to choose from. Typically, the complexity of optimization is handled by dividing optimization into two phases: a heuristic phase (called query rewriting) that narrows the space of plans to consider, and a cost-based phase that compares the relative merits of plans that lie in the narrowed space. The goal of query rewriting is to transform queries into equivalent queries that are more amenable to plan generation. This process has proven to be error-prone. Rewrites over nested queries and queries returning duplicates have been especially problematic, as evidenced by the well-known COUNT bug of the unnesting rewrites of Kim. The advent of object-oriented and object-relational databases only exacerbates this issue by introducing more complex data and by implication, more complex queries and query rewrites. This thesis addresses the correctness issue for query rewriting. We introduce a novel framework (COKO-KOLA) for expressing query rewrites that can be verified with an automated theorem prover. At its foundation lies KOLA: our combinator-based query algebra that permits expression of simple query rewrites (rewrite rules) without imperative code. While rewrite rules are easily verified, they lack the expressivity to capture many query rewrites used in practice. We address this issue in two ways: (1) We introduce a language (COKO) to express complex query transformations using KOLA rule sets and an algorithm to control rule firing. COKO supports expression of query rewrites that are too general to be expressed with rewrite rules alone. (2) We extend KOLA to permit expression of rewrite rules whose firing requires inferring semantic conditions. This extension permits expression of query rewrites that are too specific to be expressed with rewrite rules alone. The recurring theme of this work is that all of the proposed techniques are made possible by a combinator-based representation of queries.

[1]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.

[2]  Guido Moerkotte,et al.  A Blackboard Architecture for Query Optimization in Object Bases , 1993, VLDB.

[3]  M. Muralikrishna Optimization and Dataflow Algorithms for Nested Tree Queries , 1989, VLDB.

[4]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[5]  Peter Buneman,et al.  Structural Recursion as a Query Language , 1992, DBPL.

[6]  David J. DeWitt,et al.  OPT++ : an object-oriented implementation for extensible database query optimization , 1999, The VLDB Journal.

[7]  Leonidas Fegaras,et al.  Query unnesting in object-oriented databases , 1998, SIGMOD '98.

[8]  Thomas Johnsson,et al.  Lambda Lifting: Treansforming Programs to Recursive Equations , 1985, FPCA.

[9]  Jonathan S. Ostroff,et al.  Formal methods for the specification and design of real-time safety critical systems , 1992, J. Syst. Softw..

[10]  Martin Erwig,et al.  A Functional DBPL Revealing High Level Optimizations , 1992, DBPL.

[11]  Georges Gardarin,et al.  A Rule-Based Query Optimizer with Multiple Search Strategies , 1994, Data Knowl. Eng..

[12]  Joachim Kröger,et al.  Query Optimization in the CROQUE Project , 1996, DEXA.

[13]  David Maier,et al.  Towards an effective calculus for object query languages , 1995, SIGMOD '95.

[14]  Timos K. Sellis,et al.  Parametric query optimization , 1992, The VLDB Journal.

[15]  Jonathan J. King QUIST: A System for Semantic Query Optimization in Relational Databases , 1981, VLDB.

[16]  Louiqa Raschid,et al.  Semantic query optimization for object databases , 1997, Proceedings 13th International Conference on Data Engineering.

[17]  Nancy Martin,et al.  Programming Expert Systems in OPS5 - An Introduction to Rule-Based Programming(1) , 1985, Int. CMG Conference.

[18]  David J. DeWitt,et al.  The EXODUS Extensible DBMS Project: An Overview , 1989 .

[19]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[20]  Catriel Beeri,et al.  Algebraic Optimization of Object-Oriented Query Languages , 1990, Theor. Comput. Sci..

[21]  D. A. Turner,et al.  A new implementation technique for applicative languages , 1979, Softw. Pract. Exp..

[22]  Stanley B. Zdonik,et al.  Balancing push and pull for data broadcast , 1997, SIGMOD '97.

[23]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[24]  Johann-Christoph Freytag,et al.  A rule-based view of query optimization , 1987, SIGMOD '87.

[25]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[26]  Stanley B. Zdonik,et al.  Inferring Function Semantics to Optimize Queries , 1998, VLDB.

[27]  Hélène Kirchner,et al.  The Term Rewriting Approach to Automated Theorem Proving , 1992, J. Log. Program..

[28]  Ralf Hartmut Güting,et al.  Rule-based optimization and query processing in an extensible geometric database system , 1992, TODS.

[29]  D. A. Turner,et al.  Miranda: A Non-Strict Functional language with Polymorphic Types , 1985, FPCA.

[30]  Eugene J. Shekita,et al.  Fundamental techniques for order optimization , 1996, SIGMOD '96.

[31]  Laks V. S. Lakshmanan,et al.  Tables as a paradigm for querying and restructuring (extended abstract) , 1996, PODS '96.

[32]  Stanley B. Zdonik,et al.  Changing the rules: transformations for rule-based optimizers , 1998, SIGMOD '98.

[33]  Karen Ward,et al.  Dynamic query evaluation plans , 1989, SIGMOD '89.

[34]  Jack A. Orenstein,et al.  The ObjectStore database system , 1991, CACM.

[35]  Limsoon Wong,et al.  Naturally Embedded Query Languages , 1992, ICDT.

[36]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[37]  Béatrice Finance,et al.  A rule-based query rewriter in an extensible DBMS , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[38]  Harry K. T. Wong,et al.  Optimization of nested SQL queries revisited , 1987, SIGMOD '87.

[39]  Stanley B. Zdonik,et al.  Control of an Extensible Query Optimizer: A Planning-Based Approach , 1993, VLDB.

[40]  John Grant,et al.  Semantic Query Optimization: Additional Constraints and Control Strategies , 1986, Expert Database Conf..

[41]  Stanley B. Zdonik,et al.  The AQUA Data Model and Algebra , 1993, DBPL.

[42]  Peter Buneman,et al.  FQL: a functional query language , 1979, SIGMOD '79.

[43]  Bennet Vance An Abstract Object-Oriented Query Execution Language , 1993, DBPL.

[44]  Peter G. Harrison,et al.  Functional Programming , 1988 .

[45]  Philippe Pucheral,et al.  OFL: a functional execution model for object query languages , 1995, SIGMOD '95.

[46]  Hamid Pirahesh,et al.  Magic is relevant , 1990, SIGMOD '90.

[47]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[48]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[49]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[50]  Pierre-Louis Curien,et al.  Categorical Combinators , 1986, Inf. Control..

[51]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[52]  Miron Livny,et al.  The Case for Enhanced Abstract Data Types , 1997, VLDB.

[53]  Laurent Amsaleg,et al.  Scrambling query plans to cope with unexpected delays , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[54]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[55]  Goetz Graefe,et al.  Optimization of dynamic query evaluation plans , 1994, SIGMOD '94.

[56]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[57]  F. Warren Burton,et al.  Efficient Combinator Code , 1985, Comput. Lang..

[58]  David L. Dill,et al.  Better verification through symmetry , 1996, Formal Methods Syst. Des..

[59]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[60]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[61]  Guido Moerkotte,et al.  Access support in object bases , 1990, SIGMOD '90.

[62]  Stanley B. Zdonik,et al.  Knowledge-Based Query Processing , 1980, VLDB.

[63]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[64]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[65]  Stanley B. Zdonik,et al.  A query algebra for object-oriented databases , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[66]  Mohamed Ziauddin,et al.  Query processing and optimization in Oracle Rdb , 1996, The VLDB Journal.

[67]  Peter M. G. Apers,et al.  From Nested-Loop to Join Queries in OODB , 1994, VLDB.

[68]  Edward Sciore,et al.  A modular query optimizer generator , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.