Join algorithms on GPUs: A revisit after seven years

Implementing database operations on parallel platforms has gain a lot of momentum in the past decade. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-of-the-art work published in SIGMOD'08 on GPU-based join processing. In particular, this work presents four major join algorithms and a number of join-related primitives on GPUs. Since 2008, the compute capabilities of GPUs have increased following a pace faster than that of the multi-core CPUs. We run a comprehensive set of experiments to study how join operations can benefit from such rapid expansion of GPU capabilities. Our experiments on today's mainstream GPU and CPU hardware show that the GPU join program achieves up to 20X speedup in end-to-end running time over a highly-optimized CPU version. This is significantly better than the 7X performance gap reported in the original paper. We also present improved GPU programs that take advantage of new GPU hardware/software features such as read-only data cache, large L2 cache, and shuffle instructions. By applying such optimizations, extra performance improvement of 30-52% is observed in various components of the GPU program. Finally, we evaluate the same program from a few other perspectives including energy efficiency, floatingpoint performance, and program development considerations to further reveal the advantages and limitations of using GPUs for database operations. In summary, we find that today's GPUs are significantly faster in floating point operations, can process more on-board data, and achieve higher energy efficiency than modern CPUs.

[1]  Anand Kumar,et al.  Data management systems on GPUs: promises and challenges , 2013, SSDBM.

[2]  Yi-Cheng Tu,et al.  Performance Analysis of Join Algorithms on GPUs , 2015 .

[3]  Bingsheng He,et al.  Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[4]  Shannon Amoils,et al.  Chromatin: The yin and yang... , 2006, Nature Reviews Molecular Cell Biology.

[5]  Martin L. Kersten,et al.  Optimizing database architecture for the new bottleneck: memory access , 2000, The VLDB Journal.

[6]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[7]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[8]  Divyakant Agrawal,et al.  Hardware Acceleration in Commercial Databases: A Case Study of Spatial Operations , 2004, VLDB.

[9]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[10]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[11]  Bingsheng He,et al.  High-Throughput Transaction Executions on Graphics Processors , 2011, Proc. VLDB Endow..

[12]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2005, SIGGRAPH Courses.

[13]  Bingsheng He,et al.  Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[14]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[15]  Divyakant Agrawal,et al.  Hardware acceleration for spatial selections and joins , 2003, SIGMOD '03.

[16]  Frank Mueller,et al.  GStream: A General-Purpose Data Streaming Framework on GPU Clusters , 2011, 2011 International Conference on Parallel Processing.

[17]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[18]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.