论文信息 - The Algebraic Path Problem on the Cell/B.E. Processor

The Algebraic Path Problem on the Cell/B.E. Processor

Report Date: Written Language: Any Other Identifying Information of this Report: Distribution Statement: Supplementary Notes: The University of Aizu Aizu-Wakamatsu Fukushima 965-8580 Japan 11/30/2010 English First Issue: 10 copies Kazuya Matsumoto, Stanislav G. Sedukhin The Algebraic Path Problem on the Cell/B.E. Processor algebraic path problem, all-pairs shortest paths problem, Cell Broadband Engine, performance evaluation, parallel computing The Algebraic Path Problem (APP) unifies well-known matrix, graph, and language problems, such as matrix inversion, all-pairs shortest paths (APSP), maximum capacity paths (MCP), minimum spanning tree, generation of regular languages, etc., into a single algorithmic scheme. The difference between APP instances is in the underlying algebraic structure. This paper explores the APP and presents an implementation of a block algorithm for solving the APP on the Cell Broadband Engine (Cell/B.E.) heterogeneous multicore processor. The block APP algorithm spends the most computing time in a block matrix-matrix multiply-add (MMA) operation in different algebras. In our APP algorithm, a fast dense MMA operation in linear (+,×)-algebra is utilized. The MMA implementation on the Cell/B.E. needs only a single fused multiply-add (FMA) instruction to obtain a single short-vector (+,×)-result in one cycle. APP instances such as APSP and MCP problems are based on (min, +)and (max, min)-algebras, respectively, which are different from the linear (+,×)-algebra, and require three and four instructions to obtain a single short-vector result in three and four cycles. Because of that, the maximum sustained performance for MMA operation on Cell/B.E. is 152 Gflop/s whereas for APSP and MCP are 50.7 Gflop/s and 38.1 Gflop/s, respectively. Manuscript submitted to Journal of Information Processing (IPSJJIP) Distributed Parallel Processing Laboratory The Algebraic Path Problem on the Cell/B.E. Processor Kazuya Matsumoto Stanislav G. Sedukhin

G. Sedukhin | G. Sedukhin

[1] Claude Tadonki. Ring Pipelined Algorithm for the Algebraic Path Problem on the CELL Broadband Engine , 2010, 2010 22nd International Symposium on Computer Architecture and High Performance Computing Workshops.

[2] A. Ya. Rodionov,et al. Universal algorithms, mathematics of semirings and parallel computations , 2010, ArXiv.

[3] Viktor K. Prasanna,et al. Analysis of memory hierarchy performance of block data layout , 2002, Proceedings International Conference on Parallel Processing.

[4] Fumihiko Ino,et al. A Task Parallel Algorithm for Computing the Costs of All-Pairs Shortest Paths on the CUDA-Compatible GPU , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[5] Jack J. Dongarra,et al. Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[6] Stanislav G. Sedukhin,et al. Matrix Inversion on the Cell/B.E. Processor , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[7] Joseph T. Kider,et al. All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[8] John R. Gilbert,et al. Solving path problems on the GPU , 2010, Parallel Comput..

[9] Sartaj Sahni,et al. A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.

[10] Sang H. Dhong,et al. The vector floating-point unit in a synergistic processor element of a CELL processor , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[11] Stanislav G. Sedukhin,et al. Design and analysis of systolic algorithms for the algebraic path problem , 1992 .

[12] Toshiaki Miyazaki,et al. Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem , 2010, IEICE Trans. Inf. Syst..

[13] G. Rote. Path Problems in Graphs , 1990 .

[14] Ceren Budak,et al. Gaussian Elimination Based Algorithms on the GPU , 2008 .

[15] Daniel J. Lehmann,et al. Algebraic Structures for Transitive Closure , 1976, Theor. Comput. Sci..

[16] Eugene Fink. A survey of sequential and systolic algorithms for the algebraic path problem , 1992 .

[17] Bruce M. Maggs,et al. Minimum-Cost Spanning Tree as a Path-Finding Problem , 1988, Inf. Process. Lett..

[18] Mehryar Mohri,et al. Semiring Frameworks and Algorithms for Shortest-Distance Problems , 2002, J. Autom. Lang. Comb..

[19] P. J. Narayanan,et al. Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[20] 伊野文彦,et al. Fast Blocked Floyd-Warshall Algorithm on the GPU , 2010 .

[21] Jack J. Dongarra,et al. The PlayStation 3 for High-Performance Scientific Computing , 2008, Computing in Science & Engineering.

[22] Eric Stahlberg,et al. Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.

[24] Sanjay V. Rajopadhye,et al. The Algebraic Path Problem Revisited , 1999, Euro-Par.

[25] G. Rote. A systolic array algorithm for the algebraic path problem (shortest paths; Matrix inversion) , 1985, Computing.

[26] Vijay K. Garg,et al. Optimization of BLAS on the cell processor , 2008, HiPC'08.

[27] Paulius Micikevicius,et al. General Parallel Computation on Commodity Graphics Hardware: Case Study with the All-Pairs Shortest Paths Problem , 2004, PDPTA.

[28] Jack J. Dongarra,et al. Implementation of mixed precision in solving systems of linear equations on the Cell processor , 2007, Concurr. Comput. Pract. Exp..

[29] Jason N. Dale,et al. Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[30] Franz Franchetti,et al. Program generation for the all-pairs shortest path problem , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[31] Viktor K. Prasanna,et al. Transitive closure on the cell broadband engine: A study on self-scheduling in a multicore processor , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[32] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..

[33] Uday Bondhugula,et al. Parallel FPGA-based all-pairs shortest-paths in a directed graph , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[34] Stanislav G. Sedukhin,et al. A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor , 2009, IEICE Trans. Inf. Syst..

[35] Leonid Oliker,et al. Memory-intensive benchmarks: IRAM vs. cache-based machines , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.