Computing weight constraint reachability in large networks

Reachability is a fundamental problem on large-scale networks emerging nowadays in various application domains, such as social networks, communication networks, biological networks, road networks, etc. It has been studied extensively. However, little existing work has studied reachability with realistic constraints imposed on graphs with real-valued edge or node weights. In fact, such weights are very common in many real-world networks, for example, the bandwidth of a link in communication networks, the reliability of an interaction between two proteins in PPI networks, and the handling capacity of a warehouse/storage point in a distribution network. In this paper, we formalize a new yet important reachability query in weighted undirected graphs, called weight constraint reachability (WCR) query that asks: is there a path between nodes $$a$$ and $$b$$, on which each real-valued edge (or node) weight satisfies a range constraint. We discover an interesting property of WCR, based on which, we design a novel edge-based index structure to answer the WCR query in $$O(1)$$ time. Furthermore, we consider the case when the index cannot entirely fit in the memory, which can be very common for emerging massive networks. An I/O-efficient index is proposed, which provides constant I/O (precisely four I/Os) query time with $$O(|V|\log |V|)$$ disk-based index size. Extensive experimental studies on both real and synthetic datasets demonstrate the efficiency and scalability of our solutions in answering the WCR query.

[1]  T. C. Hu,et al.  Multi-Terminal Network Flows , 1961 .

[2]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[3]  Peter Nijkamp,et al.  Accessibility of Cities in the Digital Economy , 2004, cond-mat/0412004.

[4]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[5]  Yangjun Chen,et al.  An Efficient Algorithm for Answering Graph Reachability Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[7]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[8]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[9]  Dan Suciu,et al.  Query containment for conjunctive queries with regular expressions , 1998, PODS.

[10]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[11]  Haixun Wang,et al.  ComputingLabel-ConstraintReachabilityinGraph Databases , 2010 .

[12]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures , 1999, External Memory Algorithms.

[13]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[14]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[15]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[16]  Jiong Yang,et al.  PathFinder: mining signal transduction pathway segments from protein-protein interaction networks , 2007, BMC Bioinformatics.

[17]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[18]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[19]  Philip S. Yu,et al.  Compact reachability labeling for graph-structured data , 2005, CIKM '05.

[20]  Philip S. Yu,et al.  Fast computing reachability labelings for large graphs with high compression rate , 2008, EDBT '08.

[21]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[22]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[23]  Peter Steenkiste,et al.  On path selection for traffic with bandwidth guarantees , 1997, Proceedings 1997 International Conference on Network Protocols.

[24]  Lei Zou,et al.  Answering label-constraint reachability in large graphs , 2011, CIKM '11.

[25]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[26]  Gerhard Weikum,et al.  Efficient creation and incremental maintenance of the HOPI index for complex XML document collections , 2005, 21st International Conference on Data Engineering (ICDE'05).

[27]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[28]  Haixun Wang,et al.  Distance-Constraint Reachability Computation in Uncertain Graphs , 2011, Proc. VLDB Endow..

[29]  Uri Zwick,et al.  A fully dynamic reachability algorithm for directed graphs with an almost linear update time , 2004, STOC '04.

[30]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[31]  Peter J. H. King,et al.  Querying multi-dimensional data indexed using the Hilbert space-filling curve , 2001, SGMD.

[32]  Martin Kalin,et al.  A graph generation software package , 1991, SIGCSE '91.

[33]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[34]  Byron Choi,et al.  On incremental maintenance of 2-hop labeling of graphs , 2008, WWW.

[35]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.