Fast exact shortest-path distance queries on large networks by pruned landmark labeling

We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

[1]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Marlon Dumas,et al.  Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs , 2011, CIKM '11.

[3]  Matthieu Latapy,et al.  Fast computation of empirically tight bounds for the diameter of massive graphs , 2009, JEAL.

[4]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[5]  Jeffrey Xu Yu,et al.  On-line exact shortest distance query processing , 2009, EDBT '09.

[6]  Hong Cheng,et al.  Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Fang Wei-Kleiner,et al.  TEDI: Efficient Shortest Path Query Answering on Graphs , 2010, Graph Data Management.

[8]  Gerhard Weikum,et al.  Fast and accurate estimation of shortest paths in large graphs , 2010, CIKM.

[9]  Ian T. Foster,et al.  Mapping the Gnutella Network , 2002, IEEE Internet Comput..

[10]  Andrew V. Goldberg,et al.  Hierarchical Hub Labelings for Shortest Paths , 2012, ESA.

[11]  Wei Chen,et al.  A compact routing scheme and approximate distance oracle for power-law graphs , 2012, TALG.

[12]  B. Mohar,et al.  Graph Minors , 2009 .

[13]  D S Callaway,et al.  Network robustness and fragility: percolation on random graphs. , 2000, Physical review letters.

[14]  Paul D. Seymour,et al.  Graph minors. III. Planar tree-width , 1984, J. Comb. Theory B.

[15]  David Bawden,et al.  Book Review: Evolution and Structure of the Internet: A Statistical Physics Approach. , 2006 .

[16]  Yang Xiang,et al.  A highway-centric labeling approach for answering distance queries on large sparse graphs , 2012, SIGMOD Conference.

[17]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.

[18]  C. Jordan Sur les assemblages de lignes. , 1869 .

[19]  Aristides Gionis,et al.  Searching the wikipedia with contextual information , 2008, CIKM '08.

[20]  Mark Crovella,et al.  Virtual landmarks for the internet , 2003, IMC '03.

[21]  Rainer Schrader,et al.  Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) , 2005, Bioinform..

[22]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[23]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Takuya Akiba,et al.  Shortest-path queries for complex networks: exploiting low tree-width outside the core , 2012, EDBT '12.

[25]  Andrew V. Goldberg,et al.  A Hub-Based Labeling Algorithm for Shortest Paths in Road Networks , 2011, SEA.

[26]  Fang Wei TEDI: efficient shortest path query answering on graphs , 2010, SIGMOD 2010.

[27]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[28]  Dietmar Schomburg,et al.  Observing local and global properties of metabolic pathways: "load points" and "choke points" in the metabolic networks , 2006, Bioinform..

[29]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[30]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[31]  Azadeh Iranmehr,et al.  Trust Management for Semantic Web , 2009, 2009 Second International Conference on Computer and Electrical Engineering.

[32]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[34]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[35]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[36]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[37]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[38]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[39]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[40]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[41]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[42]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[43]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..