Optimization techniques for queries with expensive methods

Object-relational database management systems allow knowledgeable users to define new data types as well as new methods (operators) for the types. This flexibility produces an attendant complexity, which must be handled in new ways for an object-relational database management system to be efficient. In this article we study techniques for optimizing queries that contain time-consuming methods. The focus of traditional query optimizers has been on the choice of join methods and orders; selections have been handled by “pushdown” rules. These rules apply selections in an arbitrary order before as many joins as possible, using th e assumption that selection takes no time. However, users of object-relational systems can embed complex methods in selections. Thus selections may take significant amounts of time, and the query optimization model must be enhanced. In this article we carefully define a query cost framework that incorporates both selectivity and cost estimates for selections. We develop an algorithm called Predicate Migration, and prove that it produces optimal plans for queries with expensive methods. We then describe our implementation of Predicate Migration in the commercial object-relational database management system Illustra, and discuss practical issues that affect our earlier assumptions. We compare Predicate Migration to a variety of simplier optimization techniques, and demonstrate that Predicate Migration is the best general solution to date. The alternative techniques we present may be useful for constrained workloads.

[1]  David E. Smith Controlling Backward Inference , 1989, Artif. Intell..

[2]  David Maier,et al.  Indexing in an Object-Oriented DBMS , 1986, OODBS.

[3]  Hamid Pirahesh Object-oriented features of DB2 client/server , 1994, SIGMOD '94.

[4]  Hiroyuki Kitagawa,et al.  Decomposition - An Approach for Optimizing Queries Including ADT Functions , 1992, Inf. Process. Lett..

[5]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[6]  Guido Moerkotte,et al.  Bypassing Joins in Disjunctive Queries , 1995, VLDB.

[7]  Won Kim Object-Oriented Database Systems: Promises, Reality, and Future , 1995, Modern Database Systems.

[8]  Wen-Chi Hou,et al.  Statistical estimators for relational algebra expressions , 1988, PODS '88.

[9]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[10]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[11]  David J. DeWitt,et al.  A Status Report on the oo7 OODBMS Benchmarking Effort , 1994, OOPSLA.

[12]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[13]  Yannis E. Ioannidis,et al.  Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing , 1996, VLDB.

[14]  Tom Atwood,et al.  Object Database Standard: ODMG-93, Release 1.2 , 1995 .

[15]  Carolyn Turbyfill,et al.  AS/sup 3/AP-a comparative relational database benchmark , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[16]  Wayne E. Smith Various optimizers for single‐stage production , 1956 .

[17]  Carolyn Turbyfill,et al.  AS3AP - A Comparative Relational Database Benchmark , 1989 .

[18]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[19]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.

[20]  Balakrishna R. Iyer,et al.  A polynomial time algorithm for optimizing join queries , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[21]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[22]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[23]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[24]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.

[25]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[26]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[27]  Carlo Zaniolo,et al.  Optimization in a Logic Based Language for Knowledge and Data Intensive Applications , 1988, EDBT.

[28]  Guido Moerkotte,et al.  Optimizing disjunctive queries with expensive predicates , 1994, SIGMOD '94.

[29]  David J. DeWitt,et al.  A status report on the OO7 OODBMS benchmarking effort , 1994, OOPSLA '94.

[30]  Michael Stonebraker,et al.  Extended User-Defined Indexing with Application to Textual Databases , 1988, VLDB.

[31]  J. Hellerstein Predicate Migration: Optimizing Queries with , 1992 .

[32]  Clyde L. Monma,et al.  Sequencing with Series-Parallel Precedence Constraints , 1979, Math. Oper. Res..

[33]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[34]  Dean Daniels,et al.  Optimization of Nested Queries in a Distributed Relational Database , 1984, VLDB.

[35]  Guy M. Lohman,et al.  R* optimizer validation and performance evaluation for local queries , 1986, SIGMOD '86.

[36]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[37]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[38]  Michael Stonebraker,et al.  Managing persistent objects in a multi-level store , 1991, SIGMOD '91.

[39]  Toshihide Ibaraki,et al.  On the optimal nesting order for computing N-relational joins , 1984, TODS.

[40]  Jihad Boulos Analytical Models and Neural Networks for Query Cost Evaluation , 1997, NGITS.

[41]  Eugene Wong,et al.  Decomposition—a strategy for query processing , 1976, TODS.

[42]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[43]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 (Release 1.1) , 1994 .

[44]  R. G. G. Cattell,et al.  ODMG-93: a standard for object-oriented DBMSs , 1994, SIGMOD '94.

[45]  Hiroyuki Kitagawa,et al.  Optimization of Queries Including ADT Functions , 1991, DASFAA.

[46]  Jeffrey F. Naughton,et al.  Efficient Sampling Strategies for Relational Database Operations , 1993, Theor. Comput. Sci..

[47]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[48]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[49]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[50]  Kyuseok Shim,et al.  Query Optimization in the Presence of Foreign Functions , 1993, VLDB.

[51]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[52]  Frank P. Palermo,et al.  A Data Base Search Problem , 1974 .

[53]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[54]  Joseph M. Hellerstein,et al.  Practical predicate placement , 1994, SIGMOD '94.

[55]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[56]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[57]  Guido Moerkotte,et al.  On the complexity of generating optimal plans with cross products (extended abstract) , 1997, PODS '97.

[58]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[59]  Ravi Krishnamurthy,et al.  Towards on Open Architecture for LDL , 1989, VLDB.