Query execution techniques for caching expensive methods

Object-Relational and Object-Oriented DBMSs allow users to invoke time-consuming ("expensive") methods in their queries. When queries containing these expensive methods are run on data with duplicate values, time is wasted redundantly computing methods on the same value. This problem has been studied in the context of programming languages, where "memoization" is the standard solution. In the database literature, sorting has been proposed to deal with this problem. We compare these approaches along with a third solution, a variant of unary hybrid hashing which we call Hybrid Cache. We demonstrate that Hybrid Cache always dominates memoization, and significantly outperforms sorting in many instances. This provides new insights into the tradeoff between hashing and sorting for unary operations. Additionally, our Hybrid Cache algorithm includes some new optimization for unary hybrid hashing, which can be used for other applications such as grouping and duplicate elimination. We conclude with a discussion of techniques for caching multiple expensive methods in a single query, and raise some new optimization problems in choosing caching techniques.

[1]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[2]  Hamid Pirahesh,et al.  Complex query decorrelation , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[3]  Guido Moerkotte,et al.  Bypassing Joins in Disjunctive Queries , 1995, VLDB.

[4]  Inderpal Singh Mumick,et al.  Maintenance of Materialized Views: Problems, Techniques, and Applications , 1999, IEEE Data Eng. Bull..

[5]  Joseph M. Hellerstein,et al.  Practical predicate placement , 1994, SIGMOD '94.

[6]  DONALD MICHIE,et al.  “Memo” Functions and Machine Learning , 1968, Nature.

[7]  Karl N. Levitt,et al.  Reasoning about programs , 1973, Artif. Intell..

[8]  PiraheshHamid,et al.  Cost-based optimization for magic , 1996 .

[9]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[10]  Hamid Pirahesh,et al.  Magic is relevant , 1990, SIGMOD '90.

[11]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[12]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[13]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[14]  Michael Stonebraker,et al.  Extended User-Defined Indexing with Application to Textual Databases , 1988, VLDB.

[15]  Eiichi Goto,et al.  A Hashing Method for Fast Set Operations , 1976, Inf. Process. Lett..

[16]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[17]  David Maier,et al.  Indexing in an Object-Oriented DBMS , 1986, OODBS.

[18]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.

[19]  J. Hellerstein Predicate Migration: Optimizing Queries with , 1992 .

[20]  Hiroyuki Kitagawa,et al.  Optimization of Queries Including ADT Functions , 1991, DASFAA.

[21]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[22]  Guido Moerkotte,et al.  Optimizing disjunctive queries with expensive predicates , 1994, SIGMOD '94.

[23]  Wen-Chi Hou,et al.  Statistical estimators for relational algebra expressions , 1988, PODS '88.

[24]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[25]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[26]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[27]  Ravi Krishnamurthy,et al.  Towards on Open Architecture for LDL , 1989, VLDB.

[28]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[29]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[30]  Guy M. Lohman,et al.  Optimizer Validation and Performance Evaluation for Distributed Queries , 1998 .

[31]  Michael Stonebraker,et al.  Predicate migration: optimizing queries with expensive predicates , 1992, SIGMOD Conference.