Fast shortest path distance estimation in large networks

In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These methods involve selecting a subset of nodes as landmarks and computing offline the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, it can be estimated quickly by combining the precomputed distances. We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. We therefore explore theoretical insights to devise a variety of simple methods that scale well in very large networks. The efficiency of the suggested techniques is tested experimentally using five real-world graphs having millions of edges. While theoretical bounds support the claim that random landmarks work well in practice, our extensive experimentation shows that smart landmark selection can yield dramatically more accurate results: for a given target accuracy, our methods require as much as 250 times less space than selecting landmarks at random. In addition, we demonstrate that at a very small accuracy loss our techniques are several orders of magnitude faster than the state-of-the-art exact methods. Finally, we study an application of our methods to the task of social search in large graphs.

[1]  Dániel Fogaras,et al.  Towards Scaling Fully Personalized PageRank , 2004, WAW.

[2]  Dimitrios Gunopulos,et al.  Approximate embedding-based subsequence matching of time series , 2008, SIGMOD Conference.

[3]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Tamer Kahveci,et al.  Reference-based indexing of sequence databases , 2006, VLDB.

[5]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.

[6]  Jian Pei,et al.  Efficiently indexing shortest paths by exploiting symmetry in graphs , 2009, EDBT '09.

[7]  Uri Zwick,et al.  Exact and Approximate Distances in Graphs - A Survey , 2001, ESA.

[8]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[9]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[10]  Hanan Samet,et al.  Scalable network distance browsing in spatial databases , 2008, SIGMOD Conference.

[11]  Aristides Gionis,et al.  Searching the wikipedia with contextual information , 2008, CIKM '08.

[12]  Mark Crovella,et al.  Virtual landmarks for the internet , 2003, IMC '03.

[13]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[14]  Robert Tappan Morris,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM '04.

[15]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[16]  B. M. Hill,et al.  A Simple General Approach to Inference About the Tail of a Distribution , 1975 .

[17]  Hui Zhang,et al.  Predicting Internet network distance with coordinates-based approaches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[18]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[19]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[20]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[21]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[22]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[23]  Laks V. S. Lakshmanan,et al.  Discovering leaders from community actions , 2008, CIKM '08.

[24]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[25]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[26]  Andrew V. Goldberg,et al.  Point-to-Point Shortest Path Algorithms with Preprocessing , 2007, SOFSEM.

[27]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[28]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[30]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[31]  Jon M. Kleinberg,et al.  Triangulation and embedding using small sets of beacons , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[32]  David A. Bader,et al.  Approximating Betweenness Centrality , 2007, WAW.

[33]  David D. Jensen,et al.  Using structure indices for efficient approximation of network properties , 2006, KDD '06.

[34]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[35]  Hans-Peter Kriegel,et al.  Hierarchical Graph Embedding for Efficient Query Processing in Very Large Traffic Networks , 2008, SSDBM.

[36]  John Scott What is social network analysis , 2010 .

[37]  Jon M. Kleinberg,et al.  Metric embeddings with relaxed guarantees , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[38]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[39]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[40]  M. Frans Kaashoek,et al.  Vivaldi: a decentralized network coordinate system , 2004, SIGCOMM 2004.

[41]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[42]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[44]  Hiroshi Imai,et al.  A fast algorithm for finding better routes by AI search techniques , 1994, Proceedings of VNIS'94 - 1994 Vehicle Navigation and Information Systems Conference.

[45]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[46]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[47]  Linton C. Freeman,et al.  The Sociological Concept of "Group": An Empirical Test of Two Models , 1992, American Journal of Sociology.

[48]  Haim Kaplan,et al.  Reach for A*: Efficient Point-to-Point Shortest Path Algorithms , 2006, ALENEX.