Accelerating Triangle Counting on GPU

Triangle counting is an important problem in graph mining, which has achieved great performance improvement on GPU in recent years. Instead of proposing a new GPU triangle counting algorithm, in this paper, we propose a novel lightweight graph preprocessing method to boost many state-of-the-art GPU triangle counting algorithms without changing their implementations and data structures. Specifically, we find common computing patterns in existing algorithms, and abstract two analytic models to measure how workload imbalance and diversity in these computing patterns affect performance exactly. Then, due to the NP-hardness of the model optimization, we propose approximate solutions by determining edge directions to balance workloads and reordering vertices to maximize the degree of parallelism within GPU blocks. Finally, extensive experiments confirm the significant performance improvement and high usability of our approach.

[1]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[2]  Mauro Bisson,et al.  High Performance Exact Triangle Counting on GPUs , 2017, IEEE Transactions on Parallel and Distributed Systems.

[3]  Guy E. Blelloch,et al.  Compact representations of separable graphs , 2003, SODA '03.

[4]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[5]  Vladimir Batagelj,et al.  A subquadratic triad census algorithm for large sparse networks with small maximum degree , 2001, Soc. Networks.

[6]  H. Howie Huang,et al.  TriX: Triangle counting at extreme scale , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[8]  Julian Shun,et al.  Shared-Memory Parallelism Can be Simple, Fast, and Scalable , 2017 .

[9]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[10]  Gang Wang,et al.  Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units , 2011, Proc. VLDB Endow..

[11]  Madhav V. Marathe,et al.  PATRIC: a parallel algorithm for counting triangles in massive networks , 2013, CIKM.

[12]  H. Howie Huang,et al.  TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Adam Polak,et al.  Counting Triangles in Large Graphs on GPU , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[14]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Lin Ma,et al.  A Performance Model for Memory Bandwidth Constrained Applications on Graphics Engines , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[16]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..

[17]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[18]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[19]  Yue Wang,et al.  Accelerating Truss Decomposition on Heterogeneous Processors , 2020, Proc. VLDB Endow..

[20]  Simon D. Hammond,et al.  Fast linear algebra-based triangle counting with KokkosKernels , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[21]  David A. Bader,et al.  Logarithmic Radix Binning and Vectorized Triangle Counting , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[22]  Alfredo Goldman,et al.  A Simple BSP-based Model to Predict Execution Time in GPU Applications , 2015, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC).

[23]  John D. Owens,et al.  A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.

[24]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[25]  Lei Zou,et al.  Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.

[26]  Eiko Yoneki,et al.  PDTL: Parallel and Distributed Triangle Listing for Massive Graphs , 2015, 2015 44th International Conference on Parallel Processing.

[27]  Alex Thomo,et al.  K-Core Decomposition of Large Networks on a Single PC , 2015, Proc. VLDB Endow..

[28]  Lei Zou,et al.  Triangle Counting on GPU Using Fine-Grained Task Distribution , 2019, 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW).

[29]  Jia Wang,et al.  Truss Decomposition in Massive Networks , 2012, Proc. VLDB Endow..

[30]  David A. Bader,et al.  Fast and Adaptive List Intersections on the GPU , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[31]  Christos Faloutsos,et al.  Spectral counting of triangles via element-wise sparsification and triangle-based link recommendation , 2011, Social Network Analysis and Mining.

[32]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[33]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[34]  Tamara G. Kolda,et al.  Counting Triangles in Massive Graphs with MapReduce , 2013, SIAM J. Sci. Comput..

[35]  Min-Soo Kim,et al.  EvoGraph: An Effective and Efficient Graph Upscaling Method for Preserving Graph Properties , 2018, KDD.

[36]  Dorothea Wagner,et al.  Finding, Counting and Listing All Triangles in Large Graphs, an Experimental Study , 2005, WEA.