Pruning based Distance Sketches with Provable Guarantees on Random Graphs

Measuring the distances between vertices on graphs is one of the most fundamental components in network analysis. Since finding shortest paths requires traversing the graph, it is challenging to obtain distance information on large graphs very quickly. In this work, we present a preprocessing algorithm that is able to create landmark based distance sketches efficiently, with strong theoretical guarantees. When evaluated on a diverse set of social and information networks, our algorithm significantly improves over existing approaches by reducing the number of landmarks stored, preprocessing time, or stretch of the estimated distances. On Erdos-Renyi graphs and random power law graphs with degree distribution exponent 2 < &bgr; < 3, our algorithm outputs an exact distance data structure with space between T(n5/4) and T(n3/2) depending on the value of &bgr;, where n is the number of vertices. We complement the algorithm with tight lower bounds for Erdos-Renyi graphs and the case when &bgr; is close to two.

[1]  Ashish Goel,et al.  FAST-PPR: scaling personalized pagerank estimation for large graphs , 2014, KDD.

[2]  Raymond Chi-Wing Wong,et al.  Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks , 2014, Proc. VLDB Endow..

[3]  David P. Dobkin,et al.  On sparse spanners of weighted graphs , 1993, Discret. Comput. Geom..

[4]  Mihai Patrascu Lower bounds for 2-dimensional range counting , 2007, STOC '07.

[5]  Remco van der Hofstad,et al.  Random Graphs and Complex Networks , 2016, Cambridge Series in Statistical and Probabilistic Mathematics.

[6]  Christian Sommer,et al.  Exact distance oracles for planar graphs , 2010, SODA.

[7]  Cyril Gavoille,et al.  Brief Announcement: Routing the Internet with Very Few Entries , 2015, PODC.

[8]  Berthier A. Ribeiro-Neto,et al.  Efficient search ranking in social networks , 2007, CIKM '07.

[9]  Mihai Patrascu,et al.  Distance Oracles beyond the Thorup-Zwick Bound , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[10]  Vahab S. Mirrokni,et al.  Local Computation of PageRank Contributions , 2007, Internet Math..

[11]  Peter Lofgren,et al.  Efficient Algorithms for Personalized PageRank , 2015, ArXiv.

[12]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[13]  David F. Gleich,et al.  Revisiting Power-law Distributions in Spectra of Real World Networks , 2017, KDD.

[14]  Rina Panigrahy,et al.  Lower Bounds on Near Neighbor Search via Metric Expansion , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[15]  Andrew V. Goldberg,et al.  Separating Hierarchical and General Hub Labelings , 2013, MFCS.

[16]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[17]  Edith Cohen,et al.  Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths , 2013, COSN '13.

[18]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[19]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[20]  Luca Trevisan,et al.  An Axiomatic and an Average-Case Analysis of Algorithms and Heuristics for Metric Properties of Graphs , 2016, SODA.

[21]  Ashish Goel,et al.  Partitioned multi-indexing: bringing order to social search , 2012, WWW.

[22]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[23]  Shiri Chechik,et al.  Approximate Distance Oracles with Improved Bounds , 2015, STOC.

[24]  Yury Makarychev,et al.  Algorithmic and Hardness Results for the Hub Labeling Problem , 2017, SODA.

[25]  Béla Bollobás,et al.  Random Graphs , 1985 .

[26]  Mikkel Thorup,et al.  Approximate distance oracles , 2005, J. ACM.

[27]  Kasper Green Larsen Higher Cell Probe Lower Bounds for Evaluating Polynomials , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[28]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  R. Durrett Random Graph Dynamics: References , 2006 .

[30]  Mihaela Enachescu,et al.  Reducing Maximum Stretch in Compact Routing , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[31]  Mikkel Thorup,et al.  A New Infinity of Distance Oracles for Sparse Graphs , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[32]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[33]  Ashish Goel,et al.  Bidirectional PageRank Estimation: From Average-Case to Worst-Case , 2015, WAW.

[34]  Wei Yu,et al.  Distance Oracles for Sparse Graphs , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[35]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[36]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[37]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[38]  Haim Kaplan,et al.  On the Complexity of Hub Labeling , 2015, ArXiv.

[39]  Andrew V. Goldberg,et al.  Robust Distance Queries on Massive Networks , 2014, ESA.

[40]  Andrew V. Goldberg,et al.  Computing the shortest path: A search meets graph theory , 2005, SODA '05.

[41]  Andrew V. Goldberg,et al.  Route Planning in Transportation Networks , 2015, Algorithm Engineering.

[42]  Andrew V. Goldberg,et al.  A Hub-Based Labeling Algorithm for Shortest Paths in Road Networks , 2011, SEA.

[43]  Ashish Goel,et al.  Personalized PageRank Estimation and Search: A Bidirectional Approach , 2015, WSDM.

[44]  Ittai Abraham,et al.  On Approximate Distance Labels and Routing Schemes with Affine Stretch , 2011, DISC.

[45]  Aristides Gionis,et al.  Fast shortest path distance estimation in large networks , 2009, CIKM.

[46]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[47]  Ely Porat,et al.  On the hardness of distance oracle for sparse graph , 2010, ArXiv.

[48]  Andrew V. Goldberg,et al.  Hub Labels: Theory and Practice , 2014, SEA.

[49]  Ran Raz,et al.  Distance labeling in graphs , 2001, SODA '01.

[50]  Ely Porat,et al.  Preprocess, Set, Query! , 2013, Algorithmica.

[51]  Amos Fiat,et al.  Highway dimension, shortest paths, and provably efficient algorithms , 2010, SODA '10.

[52]  Wei Chen,et al.  Compact Routing in Power-Law Graphs , 2009, DISC.

[53]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[54]  Ken-ichi Kawarabayashi,et al.  Linear-Space Approximate Distance Oracles for Planar, Bounded-Genus and Minor-Free Graphs , 2011, ICALP.

[55]  Mathias Bæk Tejs Knudsen,et al.  Sublinear Distance Labeling , 2015, ESA.

[56]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[57]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[58]  Christian Sommer,et al.  Shortest-path queries in static networks , 2014, ACM Comput. Surv..

[59]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.