On-line exact shortest distance query processing

Shortest-path query processing not only serves as a long established routine for numerous applications in the past but also is of increasing popularity to support novel graph applications in very large databases nowadays. For a large graph, there is the new scenario to query intensively against arbitrary nodes, asking to quickly return node distance or even shortest paths. And traditional main memory algorithms and shortest paths materialization become inadequate. We are interested in graph labelings to encode the underlying graphs and assign labels to nodes to support efficient query processing. Surprisingly, the existing work of this category mainly emphasizes on reachability query processing, while no sufficient effort has been given to distance labelings to support querying exact shortest distances between nodes. Distance labelings must be developed on the graph in whole to correctly retain node distance information. It makes many existing methods to be inapplicable. We focus on fast computing distance-aware 2-hop covers, which can encode the all-pairs shortest paths of a graph in O(|V|·|E|1/2) space. Our approach exploits strongly connected components collapsing and graph partitioning to gain speed, while it can overcome the challenges in correctly retaining node distance information and appropriately encoding all-pairs shortest paths with small overhead. Furthermore, our approach avoids pre-computing all-pairs shortest paths, which can be prohibitive over large graphs. We conducted extensive performance studies, and confirm the efficiency of our proposed new approaches.

[1]  Philip S. Yu,et al.  Fast computing reachability labelings for large graphs with high compression rate , 2008, EDBT '08.

[2]  H. V. Jagadish,et al.  Algorithms for Searching Massive Graphs , 1994, IEEE Trans. Knowl. Data Eng..

[3]  Gerhard Weikum,et al.  Efficient creation and incremental maintenance of the HOPI index for complex XML document collections , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[5]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[6]  Ran Raz,et al.  Distance labeling in graphs , 2001, SODA '01.

[7]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[8]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[9]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[10]  Sakti Pramanik,et al.  An Efficient Path Computation Model for Hierarchically Structured Topographical Road Maps , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[12]  David Peleg,et al.  Proximity-preserving labeling schemes , 2000, J. Graph Theory.

[13]  Raphael Yuster,et al.  Answering distance queries in directed graphs using fast matrix multiplication , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[14]  Uri Zwick,et al.  All pairs shortest paths using bridging sets and rectangular matrix multiplication , 2000, JACM.

[15]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[16]  Byron Choi,et al.  On incremental maintenance of 2-hop labeling of graphs , 2008, WWW.

[17]  John Scott What is social network analysis , 2010 .

[18]  Wade Fagen-Ulmschneider,et al.  Shortest Path , 2008, Encyclopedia of Algorithms.

[19]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[20]  Elke A. Rundensteiner,et al.  Hierarchical Encoded Path Views for Path Query Processing: An Optimal Model and Its Performance Evaluation , 1998, IEEE Trans. Knowl. Data Eng..

[21]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[22]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[23]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[24]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[25]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[26]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[27]  Haim Kaplan,et al.  Reach for A*: Efficient Point-to-Point Shortest Path Algorithms , 2006, ALENEX.

[28]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[29]  Philip S. Yu,et al.  Fast Computation of Reachability Labeling for Large Graphs , 2006, EDBT.

[30]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[31]  T. Lindvall ON A ROUTING PROBLEM , 2004, Probability in the Engineering and Informational Sciences.

[32]  A Thesis,et al.  ON THE SHORTEST PATH AND THE MINIMUM SPANNING TREE PROBLEMS , 2009 .

[33]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[34]  Andrew V. Goldberg,et al.  Computing Point-to-Point Shortest Paths from External Memory , 2005, ALENEX/ANALCO.

[35]  Rainer Schrader,et al.  Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) , 2005, Bioinform..

[36]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[37]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[38]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[39]  Martin Kalin,et al.  A graph generation software package , 1991, SIGCSE '91.

[40]  H. V. Jagadish,et al.  A compression technique to materialize transitive closure , 1990, TODS.

[41]  Robert E. Tarjan,et al.  Enumeration of the Elementary Circuits of a Directed Graph , 1972, SIAM J. Comput..