论文信息 - Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs

Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs

Many applications in network analysis require the computation of the network's Laplacian pseudo-inverse - e.g., Topological centrality in social networks or estimating commute times in electrical networks. As large graphs become ubiquitous, the traditional approaches - with quadratic or cubic complexity in the number of vertices - do not scale. To alleviate this performance issue, a divide-and-conquer approach has been recently developed. In this work, we take one step further in improving the performance of computing the pseudo-inverse of Laplacian by parallelization. Specifically, we propose a parallel, GPU-based version of this new divide-and-conquer method. Furthermore, we implement this solution in Mat lab, a native environment for such computations, recently enhanced with the ability to harness the computational capabilites of GPUs. We find that using GPUs through Mat lab, we achieve speed-ups of up to 320x compared with the sequential divide-and-conquer solution. We further compare this GPU-enabled version with three other parallel solutions: a parallel CPU implementation and CUDA-based implementation of the divide-and-conquer algorithm, as well as a GPU-based implementation that uses cuBLAS to compute the pseudo-inverse in the traditional way. We find that the GPU-based implementation outperforms the CPU parallel version significantly. Furthermore, our results demonstrate that a best GPU-based implementation does not exist: depending on the size and structure of the graph, the relative performance of the three GPU-based versions can differ significantly. We conclude that GPUs can be successfully used to improve the performance of the pseudo-inverse of a graph's Laplacian, but choosing the best performing solution remains challenging due to the non-trivial correlation between the achieved performance and the characteristics of the input graph. Our future work attempts to expose and exploit this correlation.

[1] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[2] Jérôme Kunegis,et al. KONECT: the Koblenz network collection , 2013, WWW.

[3] Zhi-Li Zhang,et al. Incremental Computation of Pseudo-Inverse of Laplacian , 2013, COCOA.

[4] Zhi-Li Zhang,et al. Incremental Computation of Pseudo-Inverse of Laplacian: Theory and Applications , 2013, ArXiv.

[5] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[6] Zhi-Li Zhang,et al. Geometry of Complex Networks and Topological Centrality , 2011, ArXiv.

[7] Zhi-Li Zhang,et al. Geometry of Complex Networks and Structural Centrality , 2011 .

[8] Swapnil D. Joshi,et al. Performance Improvement in Large Graph Algorithms on GPU using CUDA: an Overview , 2010 .

[9] P. J. Narayanan,et al. Singular value decomposition on GPU using CUDA , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10] Pierre Courrieu,et al. Fast Computation of Moore-Penrose Inverse Matrices , 2008, ArXiv.

[11] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[12] Barbara Chapman,et al. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation) , 2007 .

[13] François Fouss,et al. Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[14] François Fouss,et al. An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15] Weiwei Ma,et al. An FPGA-Based Singular Value Decomposition Processor , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[16] Pat Hanrahan,et al. Understanding the efficiency of GPU algorithms for matrix-matrix multiplication , 2004, Graphics Hardware.

[17] I. Gutman,et al. Resistance distance and Laplacian spectrum , 2003 .

[18] Rüdiger Westermann,et al. Linear algebra operators for GPU implementation of numerical algorithms , 2003, SIGGRAPH Courses.

[19] Frank McSherry,et al. Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20] Vladimir Kolmogorov,et al. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[22] David E. Keyes,et al. Toward Accelerating the Matrix Inversion Computation of Symmetric Positive-Definite Matrices on Heterogeneous GPU-Based Systems , 2011 .

[23] B. Hannaford,et al. Surgical robotics : systems, applications and visions , 2011 .

[24] M. Randic,et al. Resistance distance , 1993 .