A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor

SUMMARY The All-Pairs Shortest Paths (APSP) problem is a graph problem which can be solved by a three-nested loop program. The Cell Broadband Engine (Cell/B.E.) is a heterogeneous multi-core processor that offers the high single precision floating-point performance. In this paper, a solution of the APSP problem on the Cell/B.E. is presented. To maximize the performance of the Cell/B.E., a blocked algorithm for the APSP problem is used. The blocked algorithm enables reuse of data in registers and utilizes the memory hierarchy. We also describe several optimization techniques for effective implementation of the APSP problem on the Cell/B.E. The Cell/B.E. achieves the performance of 8.45Gflop/s for the APSP problem by using one SPE and 50.6Gflop/s by using six SPEs.

[1]  Stanislav G. Sedukhin,et al.  Parallel Blocked Algorithm for Solving the Algebraic Path Problem on a Matrix Processor , 2005, HPCC.

[2]  Benjamin Lewin,et al.  A journal of exciting biology , 1974 .

[3]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[4]  G. Rote Path Problems in Graphs , 1990 .

[5]  Franz Franchetti,et al.  Program generation for the all-pairs shortest path problem , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Jack J. Dongarra,et al.  Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..

[7]  Sartaj Sahni,et al.  A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.

[8]  Leonid Oliker,et al.  Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[9]  Viktor K. Prasanna,et al.  Cache-Friendly implementations of transitive closure , 2007, IEEE PACT.

[10]  Eugene Fink A survey of sequential and systolic algorithms for the algebraic path problem , 1992 .

[11]  Viktor K. Prasanna,et al.  Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..

[12]  Viktor K. Prasanna,et al.  Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.

[13]  Eric Stahlberg,et al.  Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[14]  Viktor K. Prasanna,et al.  Analysis of memory hierarchy performance of block data layout , 2002, Proceedings International Conference on Parallel Processing.

[15]  Leonid Oliker,et al.  Transitive closure on the imagine stream processor , 2003 .

[16]  Samuel Williams,et al.  Scientific Computing Kernels on the Cell Processor , 2007, International Journal of Parallel Programming.