A Multi Join Algorithm Utilizing Double Indices

Join has always been one of the most expensive queries to carry out in terms of the amount of time to process. This paper introduces a novel multi join algorithm to join multiple relations. The novel algorithm is based on a hashed-based join of two relations to produce a double index. This is done by scanning the two relations once. Instead of moving the records into buckets, a double index is built. This will eliminate collision as a result of a complete hash algorithm. The double index will be divided into join buckets of similar categories from the two relations. Buckets with similar keys are joined to produce joined buckets. This will lead at the end to a complete join index of the two relations without actually joining the actual relations. The time complexity required to build the join index of two categories is O(m log m) where m is the size of each category. The proposed algorithm has a time complexity of O (n log m) for all buckets where n is the number of buckets. The join index will be used to materialize the joined relation if required. Otherwise, along with other join indices of other relations, the join index builds a lattice to be used in multi-join operations with minimal I/O requirements. The lattice of the join indices can be fitted into the main memory to reduce time complexity of the multi join algorithm.

[1]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[2]  Hansjörg Zeller,et al.  An Adaptive Hash Join Algorithm for Multiuser Environments , 1990, VLDB.

[3]  Abhinandan Das,et al.  Efficient Approximation of Correlated Sums on Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[5]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[6]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[7]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[8]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[9]  Hidehiko Tanaka,et al.  Application of hash to data base machine and its architecture , 1983, New Generation Computing.

[10]  Martin L. Kersten,et al.  What Happens During a Join? Dissecting CPU and Memory Optimization Effects , 2000, VLDB.

[11]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[12]  Roberto Grossi,et al.  On sorting strings in external memory (extended abstract) , 1997, STOC '97.

[13]  Jan van Lunteren Searching very large routing tables in wide embedded memory , 2001, GLOBECOM.

[14]  J. Vitter,et al.  On Sorting Strings in External Memory , 1997 .

[15]  Arne Andersson,et al.  Tight Bounds for Searching a Sorted Array of Strings , 2000, SIAM J. Comput..

[16]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[17]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[18]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.