H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs

Triangle counting is a graph algorithm that calculates the number of triangles involving each vertex in a graph. Briefly, a triangle encompasses three vertices from a graph, where every vertex possesses at least one incidental edge to the other two vertices from the triangle. Consequently, list intersection, which identifies the incidental edges, becomes the core algorithm for triangle counting. At the meantime, attracted by the enormous parallel computing potential of Graphics Processing Units (GPUs), numerous efforts have been devoted to deploy triangle counting algorithms on GPUs.While state-of-the-art intersection algorithms, such as merge-path and binary-search, perform well on traditional multi-core CPU systems, deploying them on massively parallel GPUs turns out to be challenging. In particular, merge-path based approach experiences the hardship of evenly distributing the workload across vast GPU threads and irregular memory accesses. Binary-search based approach often suffers from the potential problem of high time complexity. Furthermore, both approaches require sorted neighbor lists from the input graphs, which involves nontrivial preprocessing overhead. To this end, we introduce H-INDEX, a hash-indexing assisted triangle counting algorithm that overcomes all the aforementioned shortcomings. Notably, HINDEX achieves 141.399 billion TEPS computing rate on a Protein K-mer V2a graph with 64 GPUs. To the best of our knowledge, this is the first work that advances triangle counting beyond the 100 billion TEPS rate.

[1]  H. Howie Huang,et al.  High-Performance Triangle Counting on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[2]  Eiko Yoneki,et al.  PDTL: Parallel and Distributed Triangle Listing for Massive Graphs , 2015, 2015 44th International Conference on Parallel Processing.

[3]  Matthieu Latapy,et al.  Main-memory triangle computations for very large (sparse (power-law)) graphs , 2008, Theor. Comput. Sci..

[4]  H. Howie Huang,et al.  TriX: Triangle counting at extreme scale , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  H. Howie Huang,et al.  CECI: Compact Embedding Cluster Index for Scalable Subgraph Matching , 2019, SIGMOD Conference.

[6]  Franz Franchetti,et al.  Preliminary Exploration of Large-Scale Triangle Counting on Shared-Memory Multicore System , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[7]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[8]  Roger Pearce,et al.  K-truss decomposition for Scale-Free Graphs at Scale in Distributed Memory , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[9]  H. Howie Huang,et al.  TriCore: Parallel Triangle Counting on GPUs , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Doru-Thom Popovici,et al.  First look: Linear algebra-based triangle counting without matrix multiplication , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  Mauro Bisson,et al.  High Performance Exact Triangle Counting on GPUs , 2017, IEEE Transactions on Parallel and Distributed Systems.

[12]  John D. Owens,et al.  A Comparative Study on Exact Triangle Counting Algorithms on the GPU , 2016, HPGP@HPDC.

[13]  Tamara G. Kolda,et al.  Wedge sampling for computing clustering coefficients and triangle counts on large graphs † , 2013, Stat. Anal. Data Min..

[14]  John R. Gilbert,et al.  Parallel Triangle Counting and Enumeration Using Matrix Algebra , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[15]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Fan Yao,et al.  XBFS: eXploring Runtime Optimizations for Breadth-First Search on GPUs , 2019, HPDC.

[18]  Lluís-Miquel Munguía,et al.  Fast triangle counting on the GPU , 2014, IA3 '14.

[19]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[20]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[21]  H. Howie Huang,et al.  iSpan: Parallel Identification of Strongly Connected Components with Spanning Trees , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.