The All‐Pair Shortest‐Path Problem in Shared‐Memory Heterogeneous Systems

This chapter faces the All-Pair Shortest-Path problem for sparse graphs combining parallel algorithms and parallel-productivity methods in heterogeneous systems. As this problem can be divided into independent Single-Source ShortestPath subproblems, we distribute this computation space into different processing units, CPUs and graphical processing units (GPUs), that are usually present in modern shared-memory systems. Although the powerful GPUs are significantly faster than the CPUs, its combined use leads to better execution times. Furthermore, two different policies have been used for the scheduling issue, an equitable scheduling, where the workspace is equitably divided between all computational units independently of its nature, and a work-stealing scheduling, where a computational unit steals a new task when it has finished its previous work.

[1]  Philippas Tsigas,et al.  On sorting and load balancing on GPUs , 2009, CARN.

[2]  Kurt Mehlhorn,et al.  A Parallelization of Dijkstra's Shortest Path Algorithm , 1998, MFCS.

[3]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4]  José D. P. Rolim,et al.  Brief announcement: routing with obstacle avoidance mechanism with constant approximation ratio , 2010, PODC.

[5]  Eva Burrows,et al.  A Hardware Independent Parallel Programming Model , 2009, J. Log. Algebraic Methods Program..

[6]  Ondrej Lhoták,et al.  Automatic parallelization for graphics processing units , 2009, PPPJ '09.

[7]  Jérémie Allard,et al.  Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.

[8]  Jaume Barceló,et al.  Microscopic traffic simulation: A tool for the design, analysis and evaluation of intelligent transport systems , 2005, J. Intell. Robotic Syst..

[9]  Pedro J. Martín,et al.  CUDA Solutions for the SSSP Problem , 2009, ICCS.

[10]  P. J. Narayanan,et al.  Large Graph Algorithms for Massively Multithreaded Architectures , 2009 .

[11]  Johan Pouwelse,et al.  Efficient Approximate Computation of Betweenness Centrality , 2010 .

[12]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[13]  Carlos Eduardo Pereira,et al.  Towards dynamic reconfigurable load-balancing for hybrid desktop platforms , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[14]  Arturo González-Escribano,et al.  A new GPU-based approach to the Shortest Path problem , 2013, HPCS.

[15]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[16]  Christopher Dyken,et al.  State-of-the-art in heterogeneous computing , 2010, Sci. Program..

[17]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[18]  Ping Yao,et al.  CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application , 2010, 2010 International Conference on High Performance Computing & Simulation.

[19]  G. C. D. Verdière Introduction to GPGPU, a hardware and software background , 2011 .

[20]  T. Cinkler,et al.  On Shortest Path Representation , 2007, IEEE/ACM Transactions on Networking.

[21]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[22]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[23]  Satnam Singh Computing without processors , 2012, CODES+ISSS '12.

[24]  Kurt Keutzer,et al.  Compile time task and resource allocation of concurrent applications to multiprocessor platforms , 2009 .

[25]  Hong Cheng,et al.  The exact distance to destination in undirected world , 2012, The VLDB Journal.

[26]  Anjul Patney,et al.  Task management for irregular-parallel workloads on the GPU , 2010, HPG '10.