Graph Reachability on Parallel Many-Core Architectures

Many modern applications are modeled using graphs of some kind. Given a graph, reachability, that is, discovering whether there is a path between two given nodes, is a fundamental problem as well as one of the most important steps of many other algorithms. The rapid accumulation of very large graphs (up to tens of millions of vertices and edges) from a diversity of disciplines demand efficient and scalable solutions to the reachability problem. General-purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize algorithms that present a high degree of regularity. In this paper, we extend the applicability of GPU processing to graph-based manipulation, by re-designing a simple but efficient state-of-the-art graph-labeling method, namely the GRAIL (Graph Reachability Indexing via RAndomized Interval) algorithm, to many-core CUDA-based GPUs. This algorithm firstly generates a label for each vertex of the graph, then it exploits these labels to answer reachability queries. Unfortunately, the original algorithm executes a sequence of depth-first visits which are intrinsically recursive and cannot be efficiently implemented on parallel systems. For that reason, we design an alternative approach in which a sequence of breadth-first visits substitute the original depth-first traversal to generate the labeling, and in which a high number of concurrent visits is exploited during query evaluation. The paper describes our strategy to re-design these steps, the difficulties we encountered to implement them, and the solutions adopted to overcome the main inefficiencies. To prove the validity of our approach, we compare (in terms of time and memory requirements) our GPU-based approach with the original sequential CPU-based tool. Finally, we report some hints on how to conduct further research in the area.

[1]  Hai Jin,et al.  Graph Processing on GPUs , 2018, ACM Comput. Surv..

[2]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[3]  Sibo Wang,et al.  Reachability queries on large dynamic graphs: a total order approach , 2014, SIGMOD Conference.

[4]  M. Tamer Özsu,et al.  Regular Path Query Evaluation on Streaming Graphs , 2020, SIGMOD Conference.

[5]  A. Aggarwal,et al.  Parallel depth-first search in general directed graphs , 1989, STOC '89.

[6]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[7]  Qing Zhu,et al.  Reachability Querying: Can It Be Even Faster? , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Giovanni Squillero,et al.  The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach , 2020, Comput..

[9]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[10]  Alexandru I. Tomescu,et al.  Sparse Dynamic Programming on DAGs with Small Width , 2019, ACM Trans. Algorithms.

[11]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[12]  Qing Zhu,et al.  I/O cost minimization: reachability queries processing over massive graphs , 2012, EDBT '12.

[13]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[14]  Gianpiero Cabodi,et al.  A Smart Many-Core Implementation of a Motion Planning Framework along a Reference Path for Autonomous Cars , 2019 .

[15]  Arthur Charguéraud,et al.  A work-efficient algorithm for parallel unordered depth-first search , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[17]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[18]  Yangjun Chen,et al.  Decomposing DAGs into spanning trees: A new way to compress transitive closures , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Giovanni Squillero,et al.  The Maximum Common Subgraph Problem: A Portfolio Approach , 2019, ArXiv.

[20]  Yangjun Chen,et al.  An Efficient Algorithm for Answering Graph Reachability Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[22]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Stefano Quer,et al.  A Fast MPEG’s CDVS Implementation for GPU Featured in Mobile Devices , 2018, IEEE Access.

[24]  Lu Qin,et al.  Answering billion-scale label-constrained reachability queries within microsecond , 2020, Proc. VLDB Endow..

[25]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.

[26]  Vassilis J. Tsotras,et al.  Efficient Processing of Reachability Queries with Meetings , 2017, SIGSPATIAL/GIS.

[27]  Michael Garland,et al.  Parallel Depth-First Search for Directed Acyclic Graphs , 2017, IA3@SC.