Estimating pairwise distances in large graphs

Point-to-point distance estimation in large scale graphs is a fundamental and well studied problem with applications in many areas such as Social Search. Previous work has focused on selecting an appropriate subset of vertices as landmarks, aiming to derive distance upper or lower bounds that are as tight as possible. In order to compute a distance bound between two vertices, the proposed methods apply triangle inequalities on top of the precomputed distances between each of these vertices and the landmarks, and then use the tightest one. In this work we take a fresh look at this setting and approach it as a learning problem. As features, we use structural attributes of the vertices involved as well as the bounds described above, and we learn a function that predicts the distance between a source and a destination vertex. We conduct an extensive experimental evaluation on a variety of real-world graphs and show that the average relative prediction error of our proposed methods significantly outperforms state-of-the-art landmark-based estimates. Our method is particularily efficient when the available space is very limited.

[1]  Yang Xiang,et al.  A highway-centric labeling approach for answering distance queries on large sparse graphs , 2012, SIGMOD Conference.

[2]  Raymond Chi-Wing Wong,et al.  IS-LABEL: an Independent-Set based Labeling Scheme for Point-to-Point Distance Querying , 2012, Proc. VLDB Endow..

[3]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[4]  Ben Y. Zhao,et al.  Efficient shortest paths on massive social graphs , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[5]  Gerhard Weikum,et al.  HOPI: An Efficient Connection Index for Complex XML Document Collections , 2004, EDBT.

[6]  Gerhard Weikum,et al.  Fast and accurate estimation of shortest paths in large graphs , 2010, CIKM.

[7]  Hong Cheng,et al.  Querying Shortest Path Distance with Bounded Errors in Large Graphs , 2011, SSDBM.

[8]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[9]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[10]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[11]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[12]  Bert Huang,et al.  Learning a Distance Metric from a Network , 2011, NIPS.

[13]  Jon M. Kleinberg,et al.  Metric embeddings with relaxed guarantees , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[14]  Hong Cheng,et al.  Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  Marlon Dumas,et al.  Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs , 2011, CIKM '11.

[16]  Hong Cheng,et al.  Top-K Nearest Keyword Search on Large Graphs , 2013, Proc. VLDB Endow..

[17]  Jian Pei,et al.  Efficiently indexing shortest paths by exploiting symmetry in graphs , 2009, EDBT '09.

[18]  Ashish Goel,et al.  Partitioned multi-indexing: bringing order to social search , 2012, WWW.

[19]  Haim Kaplan,et al.  Reach for A*: Efficient Point-to-Point Shortest Path Algorithms , 2006, ALENEX.

[20]  Surender Baswana,et al.  Streaming algorithm for graph spanners - single pass and constant processing time per edge , 2008, Inf. Process. Lett..

[21]  Aristides Gionis,et al.  Searching the wikipedia with contextual information , 2008, CIKM '08.

[22]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[23]  Andrew V. Goldberg,et al.  Implementation Challenge for Shortest Paths , 2008, Encyclopedia of Algorithms.

[24]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[25]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[26]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[27]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[28]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[29]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[30]  Andrew V. Goldberg,et al.  Point-to-Point Shortest Path Algorithms with Preprocessing , 2007, SOFSEM.

[31]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[32]  J. Matousek,et al.  On the distortion required for embedding finite metric spaces into normed spaces , 1996 .

[33]  Joan Feigenbaum,et al.  Graph distances in the streaming model: the value of space , 2005, SODA '05.

[34]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[35]  Yair Bartal,et al.  On approximating arbitrary metrices by tree metrics , 1998, STOC '98.