Non-blocking Disk-Tape Join Algorithm for Data on Tertiary Storage Systems

Massive data accumulated by business or scientific applications have reached such a great amount that they can be accommodated only on tapes. In order to make full use of these data, tools of data analysis and data mining should be developed. In such applications, disk resident data are needed to join with tape resident data. Many disk-tape join methods have been proposed, examples are CDT-NB and CDT-GH. But all these algorithms have blocking behaviour that user must wait quite a while before the first result can be seen. Since disk-tape join operation often takes a long time to finish, it is desirable to produce the join result as early as possible while the join performance doesn't deteriorate too much. The non-blocking disk-tape join (NDT) presented in this paper is the first disk-tape join algorithm designed with this goal in mind. It has three phases: the hashing phase, the merging phase and the probing phase. Join results can be produced in each phase. Tuples of disk resident relation and tape resident relation are read simultaneously into memory and be joined in the hashing phase. The merging phase joins those tuples that flushed onto disk during the hashing phase. After the first two phases, disk resident relation has been partitioned and is joined with remaining tape resident relation in the probing phase. Experimental results show that NDT can produce join results much earlier than the-state-of-art CDT-GH and the performance of NDT is about the same with that of CDT-GH

[1]  Peter M. G. Apers,et al.  Pipelining in query execution , 1990, Proceedings. PARBASE-90: International Conference on Databases, Parallel Architectures, and Their Applications.

[2]  James Frew,et al.  Data management for earth system science , 1997, SGMD.

[3]  Bernhard Seeger,et al.  Progressive Merge Join: A Generic and Non-blocking Sort-based Join Algorithm , 2002, VLDB.

[4]  Jussi Myllymaki,et al.  Relational joins for data on tertiary storage , 1997, Proceedings 13th International Conference on Data Engineering.

[5]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[6]  Jussi Myllymaki,et al.  Efficient Buffering for Concurrent Disk and Tape I/O , 1996, Perform. Evaluation.

[7]  Walid G. Aref,et al.  Hash-merge join: a non-blocking join algorithm for producing fast and early join results , 2004, Proceedings. 20th International Conference on Data Engineering.

[8]  Laura M. Haas,et al.  Seeking the truth about ad hoc join costs , 1997, The VLDB Journal.

[9]  Jussi Myllymaki,et al.  Disk-tape joins: synchronizing disk and tape access , 1995, SIGMETRICS '95/PERFORMANCE '95.

[10]  Robert B. Hagmann,et al.  An Observation on Database Buffering Performance Metrics , 1986, VLDB.

[11]  Michael Gillmann,et al.  Tape-disk join strategies under disk contention , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).