Reachability Querying: An Independent Permutation Labeling Approach

Reachability query is a fundamental graph operation which answers whether a vertex can reach another vertex over a large directed graph G with n vertices and m edges, and has been extensively studied. In the literature, all the approaches compute a label for every vertex in a graph G by index construction offline. The query time for answering reachability queries online is affected by the quality of the labels computed in index construction. The three main costs are the index construction time, the index size, and the query time. Some of the up-to-date approaches can answer reachability queries efficiently, but spend non-linear time to construct an index. Some of the up-to-date approaches construct an index in linear time and space, but may need to depth-first search G at run-time in O(n + m). In this paper, as the first, we propose a new randomized labeling approach to answer reachability queries, and the randomness is by independent permutation. We conduct extensive experimental studies to compare with the up-to-date approaches using 19 large real datasets used in the existing work and synthetic datasets. We confirm the efficiency of our approach.

[1]  Philip S. Yu,et al.  Fast computing reachability labelings for large graphs with high compression rate , 2008, EDBT '08.

[2]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[3]  Yangjun Chen,et al.  An Efficient Algorithm for Answering Graph Reachability Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  James Cheng,et al.  TF-Label: a topological-folding labeling scheme for reachability querying in a large graph , 2013, SIGMOD '13.

[5]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[6]  Hong Cheng,et al.  K-Reach: Who is in Your Small World , 2012, Proc. VLDB Endow..

[7]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[8]  Yangjun Chen,et al.  Decomposing DAGs into spanning trees: A new way to compress transitive closures , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[10]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[11]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[12]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[13]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  Amir Abboud,et al.  Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[15]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[16]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[17]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[18]  Loïc Cerf,et al.  Reachability Queries in Very Large Graphs: A Fast Refined Online Search Approach , 2014, EDBT.

[19]  Jeffrey Xu Yu,et al.  Graph Reachability Queries: A Survey , 2010, Managing and Mining Graph Data.

[20]  Bingsheng He,et al.  A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs , 2009, DASFAA.

[21]  Ruoming Jin,et al.  Simple, Fast, and Scalable Reachability Oracle , 2013, Proc. VLDB Endow..

[22]  Gerhard Weikum,et al.  FERRARI: Flexible and efficient reachability range assignment for graph indexing , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[23]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[24]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[25]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[26]  Klaus Simon An Improved Algorithm for Transitive Closure on Acyclic Digraphs , 1988, Theor. Comput. Sci..

[27]  Yang Xiang,et al.  Path-tree: An efficient reachability indexing scheme for large directed graphs , 2011, TODS.

[28]  Byron Choi,et al.  Incremental Maintenance of 2-Hop Labeling of Large Graphs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[29]  Mohammed J. Zaki,et al.  GRAIL: a scalable index for reachability queries in very large graphs , 2011, The VLDB Journal.

[30]  Qing Zhu,et al.  I/O cost minimization: reachability queries processing over massive graphs , 2012, EDBT '12.

[31]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[32]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[33]  Edith Cohen,et al.  Tighter estimation using bottom k sketches , 2008, Proc. VLDB Endow..

[34]  Ralf Schenkel FliX: A Flexible Framework for Indexing Complex XML Document Collections , 2004, EDBT Workshops.

[35]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[36]  Liam Roditty,et al.  Decremental maintenance of strongly connected components , 2013, SODA.

[37]  Jing Cai,et al.  Path-hop: efficiently indexing large graphs for reachability queries , 2010, CIKM.

[38]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[39]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[40]  J. Łącki Improved Deterministic Algorithms for Decremental Reachability and Strongly Connected Components , 2013, SODA 2013.

[41]  Jeffrey Xu Yu,et al.  SCARAB: scaling reachability computation on large graphs , 2012, SIGMOD Conference.

[42]  Mohammed J. Zaki,et al.  DAGGER: A Scalable Index for Reachability Queries in Large Dynamic Graphs , 2013, ArXiv.

[43]  R. A. Fisher,et al.  Statistical Tables for Biological, Agricultural and Medical Research , 1956 .

[44]  Edith Cohen,et al.  Summarizing data using bottom-k sketches , 2007, PODC '07.

[45]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[46]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[47]  Sibo Wang,et al.  Reachability queries on large dynamic graphs: a total order approach , 2014, SIGMOD Conference.