Rapid development of parallel blocked all-pairs shortest paths code for multi-core computers

In this paper, we present the rapid development of efficient multi-core parallel code for the blocked Floyd-Warshall algorithm, using the C++ library gae.h [4], on top of the Intel C++ Threading Building Blocks (TBB) library. We demonstrate that developing parallel multi-core code using gae.h is easy and fast. The parallel code developed with gae.h is also efficient. The efficiency of the parallel code developed for the blocked Floyd-Warshall algorithm on an 80-core HP ProLiant DL980 G7 multi-core machine is above 70% if less than 70 cores are used and above 90% if less than 20 cores are used.

[1]  Stanislav G. Sedukhin,et al.  A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor , 2009, IEICE Trans. Inf. Syst..

[2]  Ma Jian,et al.  A Parallel Floyd-Warshall algorithm based on TBB , 2010, 2010 2nd IEEE International Conference on Information Management and Engineering.

[3]  Arch D. Robison,et al.  Structured Parallel Programming: Patterns for Efficient Computation , 2012 .

[4]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[5]  Ruppa K. Thulasiram,et al.  Task Level Parallelization of All Pair Shortest Path Algorithm in OpenMP 3.0 , 2013, CSE 2013.

[6]  Uday Bondhugula,et al.  Parallel FPGA-based all-pairs shortest-paths in a directed graph , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[7]  Robert A. van de Geijn,et al.  Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.

[8]  Peiyi Tang A C++ library for rapid development of efficient parallel dense linear algebra codes for multicore computers , 2013, ACMSE '13.

[9]  Margreet Louter-Nool Block-Cholesky for parallel processing , 1992 .

[10]  Ling Shang,et al.  A New Parallel Paradigm for Block-Based Gauss-Jordan Algorithm , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[11]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[12]  Viktor K. Prasanna,et al.  Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..

[14]  Sartaj Sahni,et al.  A blocked all-pairs shortest-paths algorithm , 2003, ACM J. Exp. Algorithmics.

[15]  Joseph T. Kider,et al.  All-pairs shortest-paths for large graphs on the GPU , 2008, GH '08.

[16]  Jack J. Dongarra,et al.  Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..

[17]  Stanislav G. Sedukhin,et al.  Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[18]  Conrad Sanderson,et al.  Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments , 2010 .