Parallel Hybrid Join Algorithm on GPU

In data analytics applications, join is a general and time consuming operation. Optimizing join algorithms can benefit the query processing significantly. The emerging of GPUs provides a massive parallelism solution for improving the performance of the join operation. The hash join (HJ) and sort merge join (SMJ), which are two widely used join algorithms, have been proved effective for efficient join processing on the GPUs. Both algorithms have their own advantages and drawbacks, offering the chance of combining the advantages of HJ and SMJ on GPUs. In processing join operation on GPUs, data need to be transmitted between the CPU and the GPU due to the discrete GPU memory design, which causes performance degradation because of the high PCIe data transfer overhead. As GPUs are becoming more powerful than before, the performance gap between data transmission and GPU execution increases, which makes it even harder to implement an efficient join on GPUs. In this paper, we focus on the optimization of join algorithms on GPUs. We propose the Parallel Hybrid Join algorithm on GPUs(PHYJ) to combine the advantages of HJ and SMJ, and overlap the data communication and GPU execution with a pipeline mechanism. In our evaluation, the PHYJ shows up to 1.72X and 1.55X speedup over the up-to-date HJ and SMJ algorithms respectively on a NVIDIA GTX 1080ti-Pascal GPU. On the TitanV-Volta GPU, up to 1.54X and 1.42X improvements can be achieved over the baseline HJ and SMJ algorithms respectively.

[1]  Anastasia Ailamaki,et al.  Hardware-Conscious Hash-Joins on GPUs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[2]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[3]  Shinpei Kato,et al.  Relational Joins on GPUs: A Closer Look , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Yi Lu,et al.  AdaptDB: Adaptive Partitioning for Distributed Joins , 2017, Proc. VLDB Endow..

[5]  Gustavo Alonso,et al.  Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.

[6]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..

[7]  Bingsheng He,et al.  Improving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach , 2015, Proc. VLDB Endow..

[8]  Xiaodong Liu,et al.  IMGPU: GPU-Accelerated Influence Maximization in Large-Scale Social Networks , 2014, IEEE Transactions on Parallel and Distributed Systems.

[9]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[10]  Yi-Cheng Tu,et al.  Fast Equi-Join Algorithms on GPUs: Design and Implementation , 2017, SSDBM.

[11]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[12]  Gustavo Alonso,et al.  Distributed Join Algorithms on Thousands of Cores , 2017, Proc. VLDB Endow..

[13]  Hans-Arno Jacobsen,et al.  A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs , 2017, SIGMOD Conference.

[14]  Marco Maggioni,et al.  Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.

[15]  Siyuan Ma,et al.  Concurrent Analytical Query Processing with GPUs , 2014, Proc. VLDB Endow..

[16]  Xiaoyong Du,et al.  An adaptive breadth-first search algorithm on integrated architectures , 2018, The Journal of Supercomputing.

[17]  Anastasia Ailamaki,et al.  Hardware-conscious Query Processing in GPU-accelerated Analytical Engines , 2019, CIDR.

[18]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[19]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[20]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[21]  Hao Li,et al.  Join algorithms on GPUs: A revisit after seven years , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[22]  Hao Wang,et al.  SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU , 2019, PPoPP.

[23]  Shinpei Kato,et al.  GPU-Accelerated VoltDB: A Case for Indexed Nested Loop Join , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[24]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[25]  Eric P. Xing,et al.  GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.