Accelerating reachability query processing based on $$\varvec{DAG}$$DAG reduction

Answering reachability queries is one of the fundamental graph operations. The existing approaches build indexes and answer reachability queries on a directed acyclic graph (DAG) $$G$$G, which is constructed by coalescing each strongly connected component of the given directed graph $$\mathcal {G}$$G into a node of $$G$$G. Considering that $$G$$G can still be large to be processed efficiently, there are studies to further reduce $$G$$G to a smaller graph. However, these approaches suffer from either inefficiency in answering reachability queries, or cannot scale to large graphs. In this paper, we study DAG reduction to accelerate reachability query processing, which reduces the size of $$G$$G by computing transitive reduction (TR) followed by computing equivalence reduction (ER). For TR, we propose a bottom-up algorithm, namely buTR, which removes from $$G$$G all redundant edges to get the unique smallest DAG$$G^{t}$$Gt satisfying that $$G^{t}$$Gt has the same transitive closure as that of $$G$$G. For ER, we propose a divide-and-conquer algorithm, namely linear-ER. Given the result $$G^{t}$$Gt of TR, linear-ER gets a smaller DAG$$G^{\varepsilon }$$Gε in linear time based on equivalence relationship between nodes in $$G$$G. Our DAG reduction approaches (TR and ER) significantly improve the cost of time and space and can be scaled to large graphs. Based on the result of DAG reduction, we further propose a graph decomposition-based algorithm to efficiently answer reachability queries. We confirm the efficiency of our approaches by extensive experimental studies for TR, ER, and reachability query processing using 20 real datasets. The complete source code is available for download at https://pan.baidu.com/s/1skHBXXN.

[1]  Gerhard Weikum,et al.  FERRARI: Flexible and efficient reachability range assignment for graph indexing , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[3]  Mohammed J. Zaki,et al.  GRAIL: a scalable index for reachability queries in very large graphs , 2011, The VLDB Journal.

[4]  Jeffrey Xu Yu,et al.  SCARAB: scaling reachability computation on large graphs , 2012, SIGMOD Conference.

[5]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[6]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[7]  Edith Cohen,et al.  Estimating the size of the transitive closure in linear time , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[8]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[9]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[10]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[11]  Jesper Larsson Träff,et al.  A Meticulous Analysis of Mergesort Programs , 1997, CIAC.

[12]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[13]  Jeffrey Xu Yu,et al.  DAG Reduction: Fast Answering Reachability Queries , 2017, SIGMOD Conference.

[14]  Qing Zhu,et al.  Reachability Querying: Can It Be Even Faster? , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Takuya Akiba,et al.  Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths , 2013, CIKM.

[16]  Sibo Wang,et al.  Reachability queries on large dynamic graphs: a total order approach , 2014, SIGMOD Conference.

[17]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[18]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[19]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[20]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[21]  Loïc Cerf,et al.  Reachability Queries in Very Large Graphs: A Fast Refined Online Search Approach , 2014, EDBT.

[22]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[23]  Jeffrey Xu Yu,et al.  Reachability querying: an independent permutation labeling approach , 2014, The VLDB Journal.

[24]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[25]  Klaus Simon,et al.  An Improved Algorithm for Transitive Closure on Acyclic Digraphs , 1986, Theor. Comput. Sci..

[26]  James Cheng,et al.  TF-Label: a topological-folding labeling scheme for reachability querying in a large graph , 2013, SIGMOD '13.

[27]  Jean-Xavier Rampon,et al.  On the calculation of transitive reduction - closure of orders , 1993, Discret. Math..

[28]  Ruoming Jin,et al.  Simple, Fast, and Scalable Reachability Oracle , 2013, Proc. VLDB Endow..

[29]  Hongjun Lu,et al.  Holistic Twig Joins on Indexed XML Documents , 2003, VLDB.

[30]  Ioannis G. Tollis,et al.  Weak Dominance Drawings and Linear Extension Diameter , 2011, ArXiv.

[31]  Yang Xiang,et al.  Path-tree: An efficient reachability indexing scheme for large directed graphs , 2011, TODS.

[32]  Jeremy P. Spinrad,et al.  Transitive closure for restricted classes of partial orders , 1991 .

[33]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.