Progressive Hash-Merge Join Algorithm

In data streams or Web scenario at highly variable and unpredictable rates, most fast join algorithms to date rely upon shifting to external join stage as soon as possible when blocked in order to enhance efficiency. But they have trouble with the following issues: the limit of external join and practical executing time. Classical progressive two-way joining technique based on hash, however, fail to deliver acceptable performance in such a scenario where relatively short intermittent delay exists in the gross. We propose a new progressive join algorithm based on hash-merge for improving the query response time, separating one merging transaction into multi-subtask, whose transacting sizes rest with the interval time. Additionally, a refined replacement selection tree and a fine granularity timestamp are applied, which help to make use of finite memory and ensure correctness respectively. Theory and experimental results show that our technique delivers results significantly fast under both reliable and unreliable network.

[1]  Mohamed F. Mokbel,et al.  PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[3]  Michael J. Franklin,et al.  XJoin: Getting Fast Answers From Slow and Bursty Networks , 1999 .

[4]  D. Agrawal,et al.  Efficient Skyline Computation over Ad-hoc Aggregations , 2008 .

[5]  Manolis Koubarakis,et al.  Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Jianzhong Li,et al.  Non-blocking Disk-Tape Join Algorithm for Data on Tertiary Storage Systems , 2005, The Fifth International Conference on Computer and Information Technology (CIT'05).

[7]  Walid G. Aref,et al.  Hash-merge join: a non-blocking join algorithm for producing fast and early join results , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Moustafa A. Hammad,et al.  Adaptive Execution of Stream Window Joins in a Limited Memory Environment , 2007, 11th International Database Engineering and Applications Symposium (IDEAS 2007).

[9]  Ning Jing,et al.  NSJ: an efficient non-blocking spatial join algorithm , 2006, GIS '06.

[10]  Wee Hyong Tok,et al.  Progressive High-Dimensional Similarity Join , 2007, DEXA.

[11]  Wee Hyong Tok,et al.  A stratified approach to progressive approximate joins , 2008, EDBT '08.

[12]  Hyeong-Ah Choi,et al.  Maximizing Throughput for Queries over Streaming Sensor Data , 2006, 2006 IEEE International Conference on Mobile Ad Hoc and Sensor Systems.

[13]  Laurent Amsaleg,et al.  Scrambling query plans to cope with unexpected delays , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[14]  Wee Hyong Tok,et al.  RRPJ: Result-Rate Based Progressive Relational Join , 2007, DASFAA.

[15]  Jeffrey F. Naughton,et al.  A non-blocking parallel spatial join algorithm , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[17]  Jeffrey F. Naughton,et al.  Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources , 2003, VLDB.

[18]  Yufei Tao,et al.  RPJ: producing fast join results on streams through rate-based optimization , 2005, SIGMOD '05.