An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures

Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford's algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless, the high degree of parallelism is guaranteed at the cost of low work efficiency, which, compared to similar algorithms in literature (e.g., Dijkstra's) involves much more redundant work and a consequent waste of power consumption. This article presents a parallel implementation of the Bellman-Ford algorithm that exploits the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve both performance and work efficiency. The article presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experimental results show that the proposed implementation provides an average speedup of <inline-formula><tex-math notation="LaTeX">$5 \times$ </tex-math><alternatives><inline-graphic xlink:type="simple" xlink:href="bombieri-ieq1-2485994.gif"/></alternatives></inline-formula> higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and that it sensibly reduces the redundant work caused by the parallelization process.

[1]  Fabio Checconi,et al.  Scalable Single Source Shortest Path Algorithms for Massively Parallel Systems , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[2]  T. Lindvall ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.

[3]  Nicola Bombieri,et al.  BFS-4K: An Efficient Implementation of BFS for Kepler GPU Architectures , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Harold N. Gabow Scaling Algorithms for Network Problems , 1985, J. Comput. Syst. Sci..

[5]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[6]  Ulrich Meyer,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[7]  Steffen Klamt,et al.  Computing paths and cycles in biological interaction graphs , 2009, BMC Bioinformatics.

[8]  Arturo González-Escribano,et al.  A new GPU-based approach to the Shortest Path problem , 2013, HPCS.

[9]  Mohamed Saad,et al.  Joint Optimal Routing and Power Allocation for Spectral Efficiency in Multihop Wireless Networks , 2014, IEEE Transactions on Wireless Communications.

[10]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[11]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[12]  Harold N. Gabow,et al.  Scaling algorithms for network problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[13]  Hoay Beng Gooi,et al.  Increasing the Regenerative Braking Energy for Railway Vehicles , 2014, IEEE Transactions on Intelligent Transportation Systems.

[14]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15]  P. Gács,et al.  Algorithms , 1992 .

[16]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[17]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[18]  Yitzhak Birk,et al.  Merge Path - Parallel Merging Made Simple , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[19]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[20]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[21]  U. Pape,et al.  Implementation and efficiency of Moore-algorithms for the shortest route problem , 1974, Math. Program..

[22]  F. Benjamin Zhan,et al.  Shortest Path Algorithms: An Evaluation Using Real Road Networks , 1998, Transp. Sci..

[23]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[25]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[26]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[27]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[28]  L. R. Ford,et al.  NETWORK FLOW THEORY , 1956 .

[29]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[30]  Pedro J. Martín,et al.  CUDA Solutions for the SSSP Problem , 2009, ICCS.

[31]  David A. Bader,et al.  GTgraph : A Synthetic Graph Generator Suite , 2006 .

[32]  Andrew Lumsdaine,et al.  Single-Source Shortest Paths with the Parallel Boost Graph Library , 2006, The Shortest Path Problem.

[33]  David A. Bader,et al.  Advanced Shortest Paths Algorithms on a Massively-Multithreaded Architecture , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[34]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[35]  Tao B. Schardl,et al.  Parallel Single-Source Shortest Paths , 2010 .

[36]  Xinming Zhang,et al.  Optimal candidate set for opportunistic routing in asynchronous wireless sensor networks , 2014, 2014 23rd International Conference on Computer Communication and Networks (ICCCN).

[37]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.