Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

Recent work has demonstrated that the use of programmable GPUs can be advantageous during relational query processing on analytical workloads. In this paper, we take a closer look at graph problems such as finding all triangles and all four-cliques of a graph. In particular, we present two different join algorithms for the GPU. The first is an implementation of Leapfrog-Triejoin (LFTJ), a recently presented worst-case optimal multi-predicate join algorithm. The second is a novel approach, inspired by the former but more suitable for GPU architectures. Our preliminary performance benchmarks show that for both approaches using GPUs is cost-effective. (the GPU implementation outperforms respective CPU variants). While the second algorithm is faster overall, it comes with increased implementation complexity and storage requirements for intermediary results. Furthermore, both our algorithms are competitive with the hand-written C++ implementation for finding triangles and four-cliques in the graph-processing system GraphLab executing on a multi-core CPU.

[1]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[2]  Bingsheng He,et al.  Revisiting Co-Processing for Hash Joins on the Coupled CPU-GPU Architecture , 2013, Proc. VLDB Endow..

[3]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[4]  James Cheng,et al.  Triangle listing in massive networks and its applications , 2011, KDD.

[5]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[6]  Ulf Assarsson,et al.  Efficient stream compaction on wide SIMD many-core architectures , 2009, High Performance Graphics.

[7]  Yufei Tao,et al.  Massive graph triangulation , 2013, SIGMOD '13.

[8]  Bingsheng He,et al.  Relational query coprocessing on graphics processors , 2009, TODS.

[9]  Jack J. Purdum,et al.  C programming guide , 1983 .

[10]  Rasmus Pagh,et al.  The input/output complexity of triangle enumeration , 2013, PODS.

[11]  Sudhakar Yalamanchili,et al.  Characterization and analysis of dynamic parallelism in unstructured GPU applications , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Siyuan Ma,et al.  Concurrent Analytical Query Processing with GPUs , 2014, Proc. VLDB Endow..

[13]  Peter Benjamin Volk,et al.  GPU join processing revisited , 2012, DaMoN '12.

[14]  Yuan Yuan,et al.  The Yin and Yang of Processing Data Warehousing Queries on GPU Devices , 2013, Proc. VLDB Endow..

[15]  Bingsheng He,et al.  Relational joins on graphics processors , 2008, SIGMOD Conference.

[16]  Pedro Trancoso,et al.  Data parallel acceleration of decision support queries using Cell/BE and GPUs , 2009, CF '09.

[17]  Kai-Uwe Sattler,et al.  Multi-level Parallel Query Execution Framework for CPU and GPU , 2013, ADBIS.

[18]  Carlos Alberto Martinez-Angeles,et al.  A Datalog Engine for GPUs , 2013, KDPD.

[19]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[20]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[21]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[22]  Kevin Skadron,et al.  Accelerating SQL database operations on a GPU with CUDA , 2010, GPGPU-3.

[23]  Sudhakar Yalamanchili,et al.  Relational algorithms for multi-bulk-synchronous processors , 2013, PPoPP '13.

[24]  Jinha Kim,et al.  OPT: a new framework for overlapped and parallel triangulation in large-scale graphs , 2014, SIGMOD Conference.

[25]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[26]  Sudhakar Yalamanchili,et al.  Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[27]  HeBingsheng,et al.  Revisiting co-processing for hash joins on the coupled CPU-GPU architecture , 2013, VLDB 2013.

[28]  Sudhakar Yalamanchili,et al.  Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..