Fast Triangle Counting on GPU

Triangle counting is one of the most basic graph applications to solve many real-world problems in a wide variety of domains. Exploring the massive parallelism of the Graphics Processing Unit (GPU) to accelerate the triangle counting is prevail. We identify that the stat-of-the-art GPU-based studies that focus on improving the load balancing still exhibit inherently a large number of random accesses in degrading the performance. In this paper, we design a prefetching scheme that buffers the neighbor list of the processed vertex in advance in the fast shared memory to avoid high latency of random global memory access. Also, we adopt the degree-based graph reordering technique and design a simple heuristic to evenly distribute the workload. Compared to the state-of-the-art HEPC Graph Challenge Champion in the last year, we advance to improve the performance of triangle counting by up to $5.9 \times $ speedup with $\gt 10^{9}$ TEPS on a single GPU for many large real graphs from graph challenge datasets.

[1]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[2]  Stijn Eyerman,et al.  Exploring optimizations on shared-memory platforms for parallel triangle counting algorithms , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Hai Jin,et al.  Scalable concurrency debugging with distributed graph processing , 2018, CGO.

[4]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[5]  Lei Zou,et al.  Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.

[6]  Mauro Bisson,et al.  Static graph challenge on GPU , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  David A. Bader,et al.  Fast and Adaptive List Intersections on the GPU , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[8]  Yitzhak Birk,et al.  Merge Path - Parallel Merging Made Simple , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[9]  Simon D. Hammond,et al.  Fast linear algebra-based triangle counting with KokkosKernels , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[11]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Mauro Bisson,et al.  High Performance Exact Triangle Counting on GPUs , 2017, IEEE Transactions on Parallel and Distributed Systems.

[13]  Lluís-Miquel Munguía,et al.  Fast triangle counting on the GPU , 2014, IA3 '14.

[14]  Roger Pearce Triangle counting for scale-free graphs at scale in distributed memory , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[15]  H. Howie Huang,et al.  High-Performance Triangle Counting on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[16]  Adam Polak,et al.  Counting Triangles in Large Graphs on GPU , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[17]  Pat Morin,et al.  Array Layouts for Comparison-Based Searching , 2015, ACM J. Exp. Algorithmics.

[18]  David A. Bader,et al.  Logarithmic Radix Binning and Vectorized Triangle Counting , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[19]  Hai Jin,et al.  Efficient and Scalable Graph Parallel Processing With Symbolic Execution , 2018, ACM Trans. Archit. Code Optim..

[20]  Franz Franchetti,et al.  Preliminary Exploration of Large-Scale Triangle Counting on Shared-Memory Multicore System , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[21]  Mauro Bisson,et al.  Update on Static Graph Challenge on GPU , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[22]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[23]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[24]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[25]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[26]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[27]  John D. Owens,et al.  A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.

[28]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[29]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[30]  H. Howie Huang,et al.  TriX: Triangle counting at extreme scale , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).