Hardware/Software Vectorization for Closeness Centrality on Multi-/Many-Core Architectures

Centrality metrics have shown to be highly correlated with the importance and loads of the nodes in a network. Given the scale of today's social networks, it is essential to use efficient algorithms and high performance computing techniques for their fast computation. In this work, we exploit hardware and software vectorization in combination with finegrain parallelization to compute the closeness centrality values. The proposed vectorization approach enables us to do concurrent breadth-first search operations and significantly increases the performance. We provide a comparison of different vectorization schemes and experimentally evaluate our contributions with respect to the existing parallel CPU-based solutions on cutting-edge hardware. Our implementations achieve to be 11 times faster than the state-of-the-art implementation for a graph with 234 million edges. The proposed techniques are beneficial to show how the vectorization can be efficiently utilized to execute other graph kernels that require multiple traversals over a large-scale network on cutting-edge architectures.

[1]  Calvin J. Ribbens,et al.  Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.

[2]  Jared Hoberock,et al.  Edge v. Node Parallelism for Graph Centrality Metrics , 2012 .

[3]  Ümit V. Çatalyürek,et al.  An Early Evaluation of the Scalability of Graph Algorithms on the Intel MIC Architecture , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[6]  Andrew G. Barto,et al.  Skill Characterization Based on Betweenness , 2008, NIPS.

[7]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[8]  David A. Bader,et al.  National Laboratory Lawrence Berkeley National Laboratory Title A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets Permalink , 2009 .

[9]  Miriam Baglioni,et al.  Fast Exact Computation of betweenness Centrality in Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[10]  Ümit V. Çatalyürek,et al.  STREAMER: A distributed framework for incremental closeness centrality computation , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[11]  Padma Raghavan,et al.  NUMA-aware graph mining techniques for performance and energy efficiency , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  David Mizell,et al.  Early experiences with large-scale Cray XMT systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13]  Ümit V. Çatalyürek,et al.  Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.

[14]  Samuel Williams,et al.  Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[15]  Erwan Le Merrer,et al.  Centralities: capturing the fuzzy notion of importance in social graphs , 2009, SNS '09.

[16]  Pak Chung Wong,et al.  A novel application of parallel betweenness centrality to power grid contingency analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[17]  Nitesh V. Chawla,et al.  DisNet: A Framework for Distributed Graph Computation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[18]  Ümit V. Çatalyürek,et al.  Shattering and Compressing Networks for Betweenness Centrality , 2013, SDM.

[19]  Ümit V. Çatalyürek,et al.  Betweenness centrality on GPUs and heterogeneous architectures , 2013, GPGPU@ASPLOS.

[20]  Ralf Klamma,et al.  The Structure of the Computer Science Knowledge Network , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[21]  Bing Zhang,et al.  Fast network centrality analysis using GPUs , 2011, BMC Bioinformatics.

[22]  David A. Bader,et al.  Computing Betweenness Centrality for Small World Networks on a GPU , 2011 .

[23]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[24]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[25]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.