Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System

This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host) memory and GPU memory is minimized. When a problem size (the number of vertices in a graph) is large enough compared with a blocking factor, the blocked algorithm virtually requires CPU$\rightleftharpoons$GPU exchanging of two block matrices for a block computation on the GPU. We also estimate a required memory/communication bandwidth to utilize the GPU efficiently. On a system containing an Intel West mere CPU (Core i7 970) and an AMD Cypress GPU (Radeon HD 5870), our implementation of the blocked APSP algorithm achieves the performance up to 1 TFlop/s in single precision.

[1]  P. J. Narayanan,et al.  Large Graph Algorithms for Massively Multithreaded Architectures , 2009 .

[2]  Stanislav G. Sedukhin,et al.  Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E. , 2010, 2010 First International Conference on Networking and Computing.

[3]  Franz Franchetti,et al.  Program generation for the all-pairs shortest path problem , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  Stanislav G. Sedukhin,et al.  A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor , 2009, IEICE Trans. Inf. Syst..

[5]  Eric Stahlberg,et al.  Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[6]  Viktor K. Prasanna,et al.  Cache-Friendly implementations of transitive closure , 2007, IEEE PACT.

[7]  伊野 文彦,et al.  Fast Blocked Floyd-Warshall Algorithm on the GPU , 2010 .

[8]  G. Sedukhin,et al.  The Algebraic Path Problem on the Cell/B.E. Processor , 2010 .

[9]  Sartaj Sahni,et al.  A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.

[10]  M. Valero,et al.  A block algorithm for the algebraic path problem and its execution on a systolic array , 1988, [1988] Proceedings. International Conference on Systolic Arrays.

[11]  Uday Bondhugula,et al.  Parallel FPGA-based all-pairs shortest-paths in a directed graph , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[12]  Geert Jan Olsder,et al.  Synchronization and Linearity: An Algebra for Discrete Event Systems , 1994 .

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[14]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[15]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  G. Rote Path Problems in Graphs , 1990 .

[18]  Synchronisation-Efficient Parallel All-Pairs Shortest Paths Computation , 2004 .

[19]  Naohito Nakasato,et al.  A fast GEMM implementation on the cypress GPU , 2011, PERV.

[20]  Tomoya Sakai,et al.  Multi-level Optimization of Matrix Multiplication for GPU-equipped Systems , 2011, ICCS.

[21]  Alexandru Nicolau,et al.  R-Kleene: A High-Performance Divide-and-Conquer Algorithm for the All-Pair Shortest Path for Densely Connected Networks , 2007, Algorithmica.