Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs

Many applications in network analysis require the computation of the network's Laplacian pseudo-inverse - e.g., Topological centrality in social networks or estimating commute times in electrical networks. As large graphs become ubiquitous, the traditional approaches - with quadratic or cubic complexity in the number of vertices - do not scale. To alleviate this performance issue, a divide-and-conquer approach has been recently developed. In this work, we take one step further in improving the performance of computing the pseudo-inverse of Laplacian by parallelization. Specifically, we propose a parallel, GPU-based version of this new divide-and-conquer method. Furthermore, we implement this solution in Mat lab, a native environment for such computations, recently enhanced with the ability to harness the computational capabilites of GPUs. We find that using GPUs through Mat lab, we achieve speed-ups of up to 320x compared with the sequential divide-and-conquer solution. We further compare this GPU-enabled version with three other parallel solutions: a parallel CPU implementation and CUDA-based implementation of the divide-and-conquer algorithm, as well as a GPU-based implementation that uses cuBLAS to compute the pseudo-inverse in the traditional way. We find that the GPU-based implementation outperforms the CPU parallel version significantly. Furthermore, our results demonstrate that a best GPU-based implementation does not exist: depending on the size and structure of the graph, the relative performance of the three GPU-based versions can differ significantly. We conclude that GPUs can be successfully used to improve the performance of the pseudo-inverse of a graph's Laplacian, but choosing the best performing solution remains challenging due to the non-trivial correlation between the achieved performance and the characteristics of the input graph. Our future work attempts to expose and exploit this correlation.

[1]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[3]  Zhi-Li Zhang,et al.  Incremental Computation of Pseudo-Inverse of Laplacian , 2013, COCOA.

[4]  Zhi-Li Zhang,et al.  Incremental Computation of Pseudo-Inverse of Laplacian: Theory and Applications , 2013, ArXiv.

[5]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[6]  Zhi-Li Zhang,et al.  Geometry of Complex Networks and Topological Centrality , 2011, ArXiv.

[7]  Zhi-Li Zhang,et al.  Geometry of Complex Networks and Structural Centrality , 2011 .

[8]  Swapnil D. Joshi,et al.  Performance Improvement in Large Graph Algorithms on GPU using CUDA: an Overview , 2010 .

[9]  P. J. Narayanan,et al.  Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10]  Pierre Courrieu,et al.  Fast Computation of Moore-Penrose Inverse Matrices , 2008, ArXiv.

[11]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[12]  Barbara Chapman,et al.  Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[13]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14]  François Fouss,et al.  An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Weiwei Ma,et al.  An FPGA-Based Singular Value Decomposition Processor , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[16]  Pat Hanrahan,et al.  Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[17]  I. Gutman,et al.  Resistance distance and Laplacian spectrum , 2003 .

[18]  Rüdiger Westermann,et al.  Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[19]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[22]  David E. Keyes,et al.  Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems , 2011 .

[23]  B. Hannaford,et al.  Surgical robotics : systems, applications and visions , 2011 .

[24]  M. Randic,et al.  Resistance distance , 1993 .