Improving Global Performance on GPU for Algorithms with Main Loop Containing a Reduction Operation: Case of Dijkstra's Algorithm

In this paper, we study the impact of copying data in GPU computing. GPU computing allows implementing parallel computations at low cost: a GPU can be purchased at under USD 500. Many studies have shown that GPU can be used to speed up the calculations. But for algorithms requiring doing a part of the calculations on GPU and another part on CPU, alternately, latency due to the copy of the data is a performance degradation factor. To illustrate this, we consider the Dijkstra’s algorithm on the shortest path used in solving optimization problems. This algorithm is very heavy to run on sequential machine. So, we are considering a parallel approach on GPU. Note that Dijkstra’s algorithm has been subject of many implementations on GPU. In the present work, we use two platforms with external GPU. Graphs are represented in adjacency matrix. During the computation of this algorithm, intermediates results are copied from GPU to CPU or from CPU to GPU. The purpose of this work is to measure the impact of these copies in the overall performance of the algorithm. For that we calculate time due to the copying data’s implementation; then we compare results with implementation computing only on CPU memory (zero-copy). The real impact shown by experiments demonstrates the interest of this study. GP-GPU programmers have to think that they will use either memory zero-copy or GPU memory. The challenge for GPU’s manufacturers is how to reduce this impact.

[1]  Arturo González-Escribano,et al.  A new GPU-based approach to the Shortest Path problem , 2013, HPCS.

[2]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[3]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[4]  Fumihiko Ino,et al.  A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[5]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[6]  Dhirendra Pratap Singh,et al.  New Approach for Graph Algorithms on GPU using CUDA , 2013 .

[7]  Dhirendra Pratap Singh,et al.  A Study of Different Parallel Implementations of Single Source Shortest Path Algorithms , 2012 .

[8]  David A. Bader,et al.  Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[10]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[11]  George Karypis,et al.  Introduction to Parallel Computing Solution Manual , 2003 .

[12]  David A. Bader,et al.  GTgraph : A Synthetic Graph Generator Suite , 2006 .

[13]  Manish Kumar Pandey,et al.  Parallel Implementations for Solving Shortest Path Problem using Bellman-Ford , 2014 .

[14]  Tibor Cinkler,et al.  On shortest path representation , 2007, IEEE/ACM Trans. Netw..

[15]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[16]  Kurt Mehlhorn,et al.  A Parallelization of Dijkstra's Shortest Path Algorithm , 1998, MFCS.

[17]  Andrew Lumsdaine,et al.  Single-Source Shortest Paths with the Parallel Boost Graph Library , 2006, The Shortest Path Problem.

[18]  P. J. Narayanan,et al.  Large Graph Algorithms for Massively Multithreaded Architectures , 2009 .

[19]  Sumit Kumar,et al.  A modified parallel approach to Single Source Shortest Path Problem for massively dense graphs using CUDA , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[20]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[21]  Jack Dongarra,et al.  Computational Science – ICCS 2009: 9th International Conference Baton Rouge, LA, USA, May 25-27, 2009 Proceedings, Part I , 2009, ICCS.

[22]  Pedro J. Martín,et al.  CUDA Solutions for the SSSP Problem , 2009, ICCS.