Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling

Graph sampling via crawling has been actively considered as a generic and important tool for collecting uniform node samples so as to consistently estimate and uncover various characteristics of complex networks. The so-called simple random walk with re-weighting (SRW-rw) and Metropolis-Hastings (MH) algorithm have been popular in the literature for such unbiased graph sampling. However, an unavoidable downside of their core random walks -- slow diffusion over the space, can cause poor estimation accuracy. In this paper, we propose non-backtracking random walk with re-weighting (NBRW-rw) and MH algorithm with delayed acceptance (MHDA) which are theoretically guaranteed to achieve, at almost no additional cost, not only unbiased graph sampling but also higher efficiency (smaller asymptotic variance of the resulting unbiased estimators) than the SRW-rw and the MH algorithm, respectively. In particular, a remarkable feature of the MHDA is its applicability for any non-uniform node sampling like the MH algorithm, but ensuring better sampling efficiency than the MH algorithm. We also provide simulation results to confirm our theoretical findings.

[1]  N. Alon,et al.  Non-backtracking random walks mix faster , 2006, math/0610550.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[4]  Christian P. Robert,et al.  A vanilla Rao--Blackwellization of Metropolis--Hastings algorithms , 2009, 0904.2144.

[5]  Masafumi Yamashita,et al.  The hitting and cover times of random walks on finite graphs using local degree information , 2009, Theor. Comput. Sci..

[6]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[7]  Radford M. Neal,et al.  ANALYSIS OF A NONREVERSIBLE MARKOV CHAIN SAMPLER , 2000 .

[8]  P. Peskun,et al.  Optimum Monte-Carlo sampling using Markov chains , 1973 .

[9]  Galin L. Jones On the Markov chain central limit theorem , 2004, math/0409112.

[10]  Persi Diaconis,et al.  Examples comparing importance sampling and the Metropolis algorithm , 2006 .

[11]  XuXin,et al.  Beyond random walk and metropolis-hastings samplers , 2012 .

[12]  Antonietta Mira,et al.  Ordering and Improving the Performance of Monte Carlo Markov Chains , 2001 .

[13]  Stephen P. Boyd,et al.  Fastest Mixing Markov Chain on a Graph , 2004, SIAM Rev..

[14]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[15]  Kyomin Jung,et al.  Fast Gossip via Non-reversible Random Walk , 2006, 2006 IEEE Information Theory Workshop - ITW '06 Punta del Este.

[16]  Radford M. Neal Improving Asymptotic Variance of MCMC Estimators: Non-reversible Chains are Better , 2004, math/0407281.

[17]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[18]  R. Ash,et al.  Probability and measure theory , 1999 .

[19]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[20]  Donald F. Towsley,et al.  Estimating and sampling graphs with multidimensional random walks , 2010, IMC '10.

[21]  Wen J. Li,et al.  Accelerating Distributed Consensus Via Lifting Markov Chains , 2007, 2007 IEEE International Symposium on Information Theory.

[22]  G. Iliopoulos,et al.  On convergence of properly weighted samples to the target distribution , 2008 .

[23]  Anne-Marie Kermarrec,et al.  Peer counting and sampling in overlay networks: random walk methods , 2006, PODC '06.

[24]  Martin T. Wells,et al.  An Extension of the Metropolis Algorithm , 2005 .

[25]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[26]  Walter Willinger,et al.  On Unbiased Sampling for Unstructured Peer-to-Peer Networks , 2006, IEEE/ACM Transactions on Networking.

[27]  Donald F. Towsley,et al.  Improving Random Walk Estimation Accuracy with Uniform Restarts , 2010, WAW.

[28]  Walter Willinger,et al.  Respondent-Driven Sampling for Characterizing Unstructured Overlays , 2009, IEEE INFOCOM 2009.

[29]  Thomas Sauerwald,et al.  Speeding up random walks with neighborhood exploration , 2010, SODA '10.

[30]  P. Green,et al.  Delayed rejection in reversible jump Metropolis–Hastings , 2001 .

[31]  V. Climenhaga Markov chains and mixing times , 2013 .

[32]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[33]  Fang Chen,et al.  Lifting Markov chains to speed up mixing , 1999, STOC '99.

[34]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.