Towards Efficient Graph Traversal using a Multi-GPU Cluster

Graph processing has always been a challenge, as there are inherent complexities in it. These include scalability to larger data sets and clusters, dependencies between vertices in the graph, irregular memory accesses during processing and traversals, minimal locality of reference, etc. In literature, there are several implementations for parallel graph processing on single GPU systems but only few for single and multi-node multi-GPU systems. In this paper, the prospects of improvement in large graph traversals by utilizing multi-GPU cluster for Breadth First Search algorithm has been studied. In this regard, a DiGPU, a CUDA-based implementation for graph traversal in shared memory multi-GPU and distributed memory multi-GPU systems has been proposed. In this work, an open source software module has also been developed and verified through set of experiments. Further, evaluations have been demonstrated on local cluster as well as on CDER cluster. Finally, experimental analysis has been performed on several graph data sets using different system configurations to study the impact of load distribution with respect to GPU specification on performance of our implementation.

[1]  Massimo Bernaschi,et al.  Enhanced GPU-based distributed breadth first search , 2015, Conf. Computing Frontiers.

[2]  Massimo Bernaschi,et al.  Efficient breadth first search on multi-GPU systems , 2013, J. Parallel Distributed Comput..

[3]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[4]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[6]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[7]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[8]  Andrew S. Grimshaw,et al.  High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..

[9]  P. J. Narayanan,et al.  Large Graph Algorithms for Massively Multithreaded Architectures , 2009 .

[10]  Nouman M. Durrani,et al.  Volunteer computing: requirements, challenges, and solutions , 2014, J. Netw. Comput. Appl..

[11]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[12]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[13]  Sumit Kumar,et al.  A modified parallel approach to Single Source Shortest Path Problem for massively dense graphs using CUDA , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[14]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .