Fast algorithms for universal quantification in large databases

Universal quantification is not supported directly in most database systems despite the fact that it adds significant power to a system's query processing and inference capabilities, in particular for the analysis of many-to-many relationships and of set-valued attributes. One of the main reasons for this omission has been that universal quantification algorithms and their performance have not been explored for large databases. In this article, we describe and compare three known algorithms and one recently proposed algorithm for relational division, the algebra operator that embodies universal quantification. For each algorithm, we investigate the performance effects of explicit duplicate removal and referential integrity enforcement, variants for inputs larger than memory, and parallel execution strategies. Analytical and experimental performance comparisons illustrate the substantial differences among the algorithms. Moreover, comparisons demonstrate that the recently proposed division algorithm evaluates a universal quantification predicate over two relations as fast as hash (semi-) join evaluates an existential quantification predicate over the same relations. Thus, existential and universal quantification can be supported with equal efficiency by adding the recently proposed algorithm to a query evaluation system. A second result of our study is that universal quantification should be expressed directly in a database query language, because most query optimizers do not recognize the rather indirect formulations available in SQL as relational division and therefore produce very poor evaluation plans for many universal quantification queries.

[1]  Josephine M. Cheng,et al.  IBM Database 2 Performance: Design, Implementation, and Tuning , 1984, IBM Syst. J..

[2]  John V. Carlis HAS, a relational algebra operator or divide is not enough to conquer , 1986, 1986 IEEE Second International Conference on Data Engineering.

[3]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[4]  David J. DeWitt,et al.  Duplicate record elimination in large data files , 1983, TODS.

[5]  SaccoGiovanni Maria Fragmentation: a technique for efficient query processing , 1986 .

[6]  Goetz Graefe,et al.  Tuning a parallel database algorithm on a shared‐memory multiprocessor , 1992, Softw. Pract. Exp..

[7]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[8]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[9]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[10]  J DeWittDavid,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989 .

[11]  Hidehiko Tanaka,et al.  An Overview of The System Software of A Parallel Relational Database Machine GRACE , 1986, VLDB.

[12]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[13]  John Miles Smith,et al.  Optimizing the performance of a relational algebra database interface , 1975, CACM.

[14]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[15]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[16]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[17]  Goetz Graefe,et al.  Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution , 1993, IEEE Trans. Software Eng..

[18]  Kyu-Young Whang,et al.  Supporting universal quantification in a two-dimensional database query language , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[19]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[20]  Goetz Graefe,et al.  Sort versus Hash Revisited , 1994, IEEE Trans. Knowl. Data Eng..

[21]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[22]  C. J. Date A guide to the SQL standard (2nd ed.) , 1989 .

[23]  Jim Gray,et al.  FastSort: a distributed single-input single-output external sort , 1990, SIGMOD '90.

[24]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.

[25]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[26]  Goetz Graefe,et al.  Relational division: four algorithms and their performance , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[27]  C. J. Date A Guide to the SQL Standard , 1987 .

[28]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[29]  Hansjörg Zeller,et al.  An Adaptive Hash Join Algorithm for Multiuser Environments , 1990, VLDB.

[30]  Masaya Nakayama,et al.  The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method , 1989, VLDB.

[31]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.