A GPU-Parallel Algorithm for Fast Hybrid BFS-DFS Graph Traversal

It seems natural to use the GPUs (Graphical Processing Units) for performing analytics on big graphs, due to the notable boost in high performance computing that their introduction has determined and to the huge volume of connected data that is being gathered and processed nowadays. A parallel strategy to speed-up the visit of all nodes of a graph based on the precomputation of critical frontiers is proposed in this paper: step by step the critical frontiers are reused so that all threads work optimally. The resulting algorithm is an asynchronous hybrid between Breadth and Depth First Search (BFS and DFS), called HBDFS. Tests with both real and synthetic heterogeneous datasets show a consistent dominance of the proposed approach over the baseline parallel BFS, achieving up to a 30 times speed-up with just a 20% memory overhead

[1]  Nancy M. Amato,et al.  Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[3]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[4]  Zhisong Fu,et al.  MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[5]  Salvatore Cuomo,et al.  On GPU–CUDA as preprocessing of fuzzy-rough data reduction by means of singular value decomposition , 2018, Soft Comput..

[6]  Salvatore Cuomo,et al.  Local principal component analysis overcomplete method: A GPU parallel implementation combining shared and global memories , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[7]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[8]  Andrew S. Grimshaw,et al.  High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..

[9]  Yangdong Deng,et al.  Taming irregular EDA applications on GPUs , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[10]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[11]  David A. Patterson,et al.  Distributed-Memory Breadth-First Search on Massive Graphs , 2017, ArXiv.

[12]  Salvatore Cuomo,et al.  A parallel PDE-based numerical algorithm for computing the Optical Flow in hybrid systems , 2017, J. Comput. Sci..

[13]  Guojing Cong,et al.  A scalable, asynchronous spanning tree algorithm on a cluster of SMPs , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[14]  Satoshi Matsuoka,et al.  Efficient Breadth-First Search on Massively Parallel and Distributed-Memory Machines , 2017, Data Science and Engineering.

[15]  Mohamed E. Hussein,et al.  On Implementing Graph Cuts on CUDA , 2007 .

[16]  V. Nageshwara Rao,et al.  Scalable parallel formulations of depth-first search , 1990 .

[17]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.