Fast Low-Cost Estimation of Network Properties Using Random Walks

Abstract We study the use of random walks as an efficient method to estimate global properties of large connected undirected graphs. Typical examples of the properties of interest include the number of edges, vertices, and triangles, and more generally, the number of small fixed subgraphs. We consider two methods based on first returns of random walks: (1) the cycle formula of regenerative processes and (2) weighted random walks with edge weights defined by the property under investigation. We review the theoretical foundations for these methods and indicate how they can be adapted for the general nonintrusive investigation of large online networks. The expected value and variance of the time of the first return of a random walk decrease with increasing vertex weight, so for a given time budget, returns to high-weight vertices should give the best property estimates. We present theoretical and experimental results on the rate of convergence of the estimates as a function of the number of returns of a random walk to a given start vertex. We made experiments to estimate the number of vertices, edges, and triangles for two test graphs.

[1]  M - Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[2]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[3]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[4]  I. Dinwoodie A Probability Inequality for the Occupation Measure of a Reversible Markov Chain , 1995 .

[5]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[6]  Optimal Hoeffding bounds for discrete reversible Markov chains , 2004, math/0405296.

[7]  Colin Cooper,et al.  Estimating network parameters using random walks , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[8]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[11]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks based on random walks , 2007, Distributed Computing.

[12]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[13]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[14]  Rajeev Motwani,et al.  Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[15]  R. O. Y. Wagner,et al.  Tail Estimates for Sums of Variables Sampled by a Random Walk , 2008, Combinatorics, Probability and Computing.

[16]  Alan M. Frieze,et al.  The Cover Time of Random Regular Graphs , 2005, SIAM J. Discret. Math..

[17]  P. Lezaud Chernoff-type bound for finite Markov chains , 1998 .

[18]  V. Climenhaga Markov chains and mixing times , 2013 .

[19]  Colin Cooper,et al.  Fast Low-Cost Estimation of Network Properties Using Random Walks , 2013, Internet Math..

[20]  Donald F. Towsley,et al.  Quick Detection of Nodes with Large Degrees , 2014, Internet Math..