Fast joins using join indices

Abstract. Two new algorithms, “Jive join” and “Slam join,” are proposed for computing the join of two relations using a join index. The algorithms are duals: Jive join range-partitions input relation tuple ids and then processes each partition, while Slam join forms ordered runs of input relation tuple ids and then merges the results. Both algorithms make a single sequential pass through each input relation, in addition to one pass through the join index and two passes through a temporary file, whose size is half that of the join index. Both algorithms require only that the number of blocks in main memory is of the order of the square root of the number of blocks in the smaller relation. By storing intermediate and final join results in a vertically partitioned fashion, our algorithms need to manipulate less data in memory at a given time than other algorithms. The algorithms are resistant to data skew and adaptive to memory fluctuations. Selection conditions can be incorporated into the algorithms. Using a detailed cost model, the algorithms are analyzed and compared with competing algorithms. For large input relations, our algorithms perform significantly better than Valduriez's algorithm, the TID join algorithm, and hash join algorithms. An experimental study is also conducted to validate the analytical results and to demonstrate the performance characteristics of each algorithm in practice.

[1]  Michael Stonebraker,et al.  Operating system support for database management , 1981, CACM.

[2]  Yun Wang,et al.  An efficient hybrid join algorithm: a DB2 prototype , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[3]  David J. DeWitt,et al.  Design and implementation of the wisconsin storage system , 1985, Softw. Pract. Exp..

[4]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[5]  Jiawei Han,et al.  Distance-associated join indices for spatial range search , 1992, [1992] Eighth International Conference on Data Engineering.

[6]  M. W. Blasgen,et al.  Storage and Access in Relational Data Bases , 1977, IBM Syst. J..

[7]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[8]  Guido Moerkotte,et al.  Access support in object bases , 1990, SIGMOD '90.

[9]  Nancy L. Martin,et al.  Join index, materialized view, and hybrid-hash join: a performance analysis , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[10]  Jeff Dozier Keynote address: access to data in NASA's Earth observing system , 1992, SIGMOD '92.

[11]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[12]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[13]  Kenneth A. Ross,et al.  A New Client-Server Architecture for Distributed Query Processing , 1994 .

[14]  Latha S. Colby,et al.  Query processing for decision support: the SQLmpp solution , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[15]  Jiawei Han,et al.  Join Index Hierarchies for Supporting Efficient Navigations in Object-Oriented Databases , 1994, VLDB.

[16]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[17]  Abraham Silberschatz,et al.  Database System Concepts, 3rd Edition , 1991 .

[18]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[19]  Prashant Palvia,et al.  Approximating Block Accesses in Database Organizations , 1984, Inf. Process. Lett..

[20]  Christos Faloutsos,et al.  Quest: a project on database mining , 1994, SIGMOD '94.

[21]  Won Kim,et al.  A new way to compute the product and join of relations , 1980, SIGMOD '80.

[22]  Kevin Strehlo,et al.  Why decision support fails and how to fix it , 1995, SGMD.

[23]  Robert Kooi,et al.  The Optimization of Queries in Relational Databases , 1980 .

[24]  Goetz Graefe,et al.  Sort versus Hash Revisited , 1994, IEEE Trans. Knowl. Data Eng..

[25]  Doron Rotem,et al.  Multiprocessor Join Scheduling , 1993, IEEE Trans. Knowl. Data Eng..

[26]  Laura M. Haas,et al.  Seeking the truth about ad hoc join costs , 1997, The VLDB Journal.

[27]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[28]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[29]  Don S. Batory,et al.  On searching transposed files , 1978, ACM Trans. Database Syst..

[30]  Kenneth A. Ross Efficiently Following Object References for Large Object Collections and Small Main Memory , 1995, DOOD.

[31]  C. Mohan,et al.  Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques , 1990, EDBT.

[32]  Hui Lei,et al.  Faster Joins, Self-Joins and Multi-Way Joins Using Join Indices , 1999, Data Knowl. Eng..

[33]  Doron Rotem Spatial join indices , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[34]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[35]  Goetz Graefe Five Performance Enhancements for Hybrid Hash Join , 1992 .

[36]  Michael J. Carey,et al.  A performance evaluation of pointer-based joins , 1990, SIGMOD '90.

[37]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[38]  A RossKenneth,et al.  Fast joins using join indices , 1999, VLDB 1999.

[39]  Abraham Silberschatz,et al.  Database Systems Concepts , 1997 .