In-Memory Join Algorithms on GPUs for Large-Data

In traditional databases, join is one of the most computationally expensive operations in query processing. During the past years, GPU has been adopted to improve the performance of join processing because of the features of massive parallelism and high memory bandwidth. Limited by the capacity of GPU memory and the absence of virtual memory management, however, handling the relations that exceed the capacity of the GPU memory is a challenge for GPU-based join algorithms. Because of the high computing throughput provided by GPUs and the low bandwidth of data communication between the CPUs and the GPUs, data have to be partitioned to fit the features of GPUs and to reduce the cost of data transmission. Furthermore, a series of novel techniques have been developed on the GPUs, which can benefit the join algorithms. In this work, we focus on the optimizing of processing join operator on large relations and propose the designs of in-memory hash join and sort-merge join on GPUs. We present the data partition method on the GPUs implemented with a pipeline mechanism. Furthermore, the shuffle instructions and the CUDA streams are applied in our algorithms to best utilize the GPUs. Experimental results indicate that our hash join algorithm delivers up to 1.51X and 1.24X speedup over the state-of-the-art hash join algorithm on CPUs on NVIDIA GTX1080ti-Pascal GPU and TitanV-Volta GPU respectively. For sort-merge join, our algorithm achieves up to 3.52X and 2.21X improvements on the same GPUs respectively compared to the baselines on CPUs.

[1]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[2]  Yi-Cheng Tu,et al.  Fast Equi-Join Algorithms on GPUs: Design and Implementation , 2017, SSDBM.

[3]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[4]  Bingsheng He,et al.  Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach , 2015, Proc. VLDB Endow..

[5]  Siyuan Ma,et al.  Concurrent Analytical Query Processing with GPUs , 2014, Proc. VLDB Endow..

[6]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[7]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[8]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[9]  Hao Wang,et al.  cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU , 2017, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Hong Chen,et al.  A Join Optimization Method for CPU/MIC Heterogeneous Systems , 2016, WAIM.

[11]  Sayantan Sur,et al.  MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.

[12]  Dhabaleswar K. Panda,et al.  GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation , 2014, IEEE Transactions on Parallel and Distributed Systems.

[13]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[14]  Gustavo Alonso,et al.  Distributed Join Algorithms on Thousands of Cores , 2017, Proc. VLDB Endow..

[15]  Hao Wang,et al.  Taming irregular applications via advanced dynamic parallelism on GPUs , 2018, CF.

[16]  Hao Li,et al.  Join algorithms on GPUs: A revisit after seven years , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[17]  Shinpei Kato,et al.  Relational Joins on GPUs: A Closer Look , 2017, IEEE Transactions on Parallel and Distributed Systems.

[18]  Gustavo Alonso,et al.  Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.

[19]  Bingsheng He,et al.  Efficient Gradient Boosted Decision Tree Training on GPUs , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[20]  Anastasia Ailamaki,et al.  HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines , 2019, Proc. VLDB Endow..

[21]  Anastasia Ailamaki,et al.  Hardware-Conscious Hash-Joins on GPUs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[22]  Yi Lu,et al.  AdaptDB: Adaptive Partitioning for Distributed Joins , 2017, Proc. VLDB Endow..

[23]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..

[24]  Xiaoyong Du,et al.  An adaptive breadth-first search algorithm on integrated architectures , 2018, The Journal of Supercomputing.

[25]  John D. Owens,et al.  GPU LSM: A Dynamic Dictionary Data Structure for the GPU , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Jignesh M. Patel,et al.  Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries , 2015, DaMoN.