An Efficient Multi Join Algorithm Utilizing a Lattice of Double Indices

In this paper, a novel multi join algorithm to join multiple relations will be introduced. The novel algorithm is based on a hashed-based join algorithm of two relations to produce a double index. This is done by scanning the two relations once. But instead of moving the records into buckets, a double index will be built. This will eliminate the collision that can happen from a complete hash algorithm. The double index will be divided into join buckets of similar categories from the two relations. The algorithm then joins buckets with similar keys to produce joined buckets. This will lead at the end to a complete join index of the two relations. without actually joining the actual relations. The time complexity required to build the join index of two categories is Om log m where m is the size of each category. Totaling time complexity to O n log m for all buckets. The join index will be used to materialize the joined relation if required. Otherwise, it will be used along with other join indices of other relations to build a lattice to be used in multi-join operations with minimal I/O requirements. The lattice of the join indices can be fitted into the main memory to reduce time complexity of the multi join algorithm. Keywords—Multi join, Relation, Lattice, Join indices.

[1]  Roberto Grossi,et al.  On sorting strings in external memory (extended abstract) , 1997, STOC '97.

[2]  Abhinandan Das,et al.  Approximate join processing over data streams , 2003, SIGMOD '03.

[3]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[4]  Jan van Lunteren Searching very large routing tables in wide embedded memory , 2001, GLOBECOM.

[5]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[6]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[7]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[8]  Martin L. Kersten,et al.  What Happens During a Join? Dissecting CPU and Memory Optimization Effects , 2000, VLDB.

[9]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[10]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[11]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Abhinandan Das,et al.  Efficient Approximation of Correlated Sums on Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[15]  J. Vitter,et al.  On Sorting Strings in External Memory , 1997 .

[16]  Arne Andersson,et al.  Tight Bounds for Searching a Sorted Array of Strings , 2000, SIAM J. Comput..