TurboBC: A Memory Efficient and Scalable GPU Based Betweenness Centrality Algorithm in the Language of Linear Algebra

Betweenness centrality (BC) is a shortest path centrality metric used to measure the influence of individual vertices or edges on huge graphs that are used for modeling and analysis of human brain, omics data, or social networks. The application of the BC algorithm to modern graphs must deal with the size of the graphs, as well with highly irregular data-access patterns. These challenges are particularly important when the BC algorithm is implemented on Graphics Processing Units (GPU), due to the limited global memory of these processors, as well as the decrease in performance due to the load unbalance resulting from processing irregular data structures. In this paper, we present the first GPU based linear-algebraic formulation and implementation of BC, called TurboBC, a set of memory efficient BC algorithms that exhibits good performance and high scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. Our experiments demonstrate that our TurboBC algorithms obtain more than 18 GTEPs and an average speedup of 31.9x over the sequential version of the BC algorithm, and are on average 1.7x and 2.2x faster than the state-of-the-art algorithms implemented on the high performance, GPU-based, gunrock, and CPU-based, ligra libraries, respectively. These experiments also show that by minimizing their memory footprint, the TurboBC algorithms are able to compute the BC of relatively big graphs, for which the gunrock algorithms ran out of memory.

[1]  Ulrik Brandes,et al.  On variants of shortest-path betweenness centrality and their generic computation , 2008, Soc. Networks.

[2]  Alex Bavelas A Mathematical Model for Group Structures , 1948 .

[3]  Ümit V. Çatalyürek,et al.  Betweenness centrality on GPUs and heterogeneous architectures , 2013, GPGPU@ASPLOS.

[4]  Fahad Saeed,et al.  TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Jared Hoberock,et al.  Edge v. Node Parallelism for Graph Centrality Metrics , 2012 .

[6]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[7]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[8]  Timothy A. Davis,et al.  The SuiteSparse Matrix Collection Website Interface , 2019, J. Open Source Softw..

[9]  John D. Owens,et al.  Multi-GPU Graph Analytics , 2015, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[10]  Walter Willinger,et al.  Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications , 2005, Internet Math..

[11]  John West,et al.  Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2016, SC.

[12]  David A. Bader,et al.  Scalable and High Performance Betweenness Centrality on the GPU , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Scott McMillan,et al.  Design of the GraphBLAS API for C , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[14]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[15]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[16]  Mona Singh,et al.  Computational solutions for omics data , 2013, Nature Reviews Genetics.

[17]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[18]  Bing Zhang,et al.  Fast network centrality analysis using GPUs , 2011, BMC Bioinformatics.

[19]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[20]  John Skvoretz,et al.  Node centrality in weighted networks: Generalizing degree and shortest paths , 2010, Soc. Networks.

[21]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .