Which Graph Representation to Select for Static Graph-Algorithms on a CUDA-capable GPU

GPUs seem to be ideal for algorithms that work in parallel. A number of ways to represent graphs in GPU memory are known. But so far there are no guidelines to select the representation that is likely to result in the best performance. This a comprehensive study investigates for CUDA-capable GPUs how different graph representations influence the performance of highly optimized graph processing algorithms that traverse the graphs without modifying them. We evaluate three different graph exchange formats and how efficiently they can be imported into eight graph data structures. We use ten state-of-the-art benchmarks that employ different traversals pattern. We evaluate them on 19 input graphs with different characteristics. The measurements show that there is not a single best data structure; the runtime performance can vary up to a factor of 2 between two representations. The main contribution is a set of rules that helps in picking the best-performing graph representation for a given situation.

[1]  -. Qiang,et al.  Graph Processing on GPUs , 2018, ACM Comput. Surv..

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  R. F. Boisvert,et al.  The Matrix Market Exchange Formats: Initial Design | NIST , 1996 .

[4]  Mauro Bisson,et al.  A CUDA implementation of the pagerank pipeline benchmark , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[6]  P J Narayanan,et al.  Fast minimum spanning tree for large graphs on the GPU , 2009, High Performance Graphics.

[7]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[9]  Kenneth A. Hawick,et al.  Parallel graph component labelling with GPUs and CUDA , 2010, Parallel Comput..

[10]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.

[11]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[12]  Michael Luby,et al.  A simple parallel algorithm for the maximal independent set problem , 1985, STOC '85.

[13]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[14]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[15]  Willy Zwaenepoel,et al.  Everything you always wanted to know about multicore graph processing but were afraid to ask , 2017, USENIX Annual Technical Conference.

[16]  Bradford M. Beckmann,et al.  Programming GPGPU Graph Applications with Linear Algebra Building Blocks , 2017, International Journal of Parallel Programming.

[17]  Jianlong Zhong,et al.  Medusa: A Parallel Graph Processing System on Graphics Processors , 2014, SGMD.

[18]  Iain S. Duff,et al.  Users' guide for the Harwell-Boeing sparse matrix collection (Release 1) , 1992 .

[19]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[20]  L. Cai Tree spanners: spanning trees that approximate distances , 1992 .

[21]  Alexander Knapp,et al.  On the Correctness of the SIMT Execution Model of GPUs , 2012, ESOP.

[22]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[23]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[24]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[25]  Anh Tu Nguyen,et al.  Parallel PageRank computation using GPUs , 2012, SoICT.

[26]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.