Respondent-Driven Sampling for Characterizing Unstructured Overlays

short-lived or high degree peers due to the dynamics of peer participation or the heterogeneity of peer degrees, respectively. This paper presents Respondent-Driven Sampling (RDS) as a promising technique for sampling unstructured P2P overlays. This allows one to accurately estimate the distribution of a desired peer property without capturing the entire overlay structure. RDS is a variant of snowball sampling that has been proposed and used in the social sciences to characterize hidden population in a society [9], [13]. We apply the RDS technique to unstructured P2P network and evaluate its performance over a wide range of static and dynamic graphs as well as a widely deployed P2P system. Throughout our evaluation, we compare and contrast the performance of the RDS technique with another sampling technique, namely Metropolized Random Walk (MRW), that we developed in our earlier work [16]. Our main findings can be summarized as follows: First, RDS outperforms MRW across all scenarios. In particular, RDS exhibits a significantly better performance than MRW when the overlay structure exhibits a combination of highly skewed node degrees and highly skewed (local) clustering coefficients. Second, our simulation and empirical evaluations reveal that both the RDS and MRW techniques can accurately estimate key peer properties over dynamic unstructured overlays. Third, our empirical evaluations suggest that the efficiency of the two sampling techniques in practice is lower than in our simulations involving synthetic graphs. We attribute this to our inability to capture accurate reference snapshots. The rest of the paper is organized as follows: Section II presents an overview of both the RDS and MRW techniques, and sketches our evaluation methodology. We examine both techniques over variety of static and dynamic graphs in Section III and IV, respectively. Section V presents the empirical evaluation of the two sampling techniques over Gnutella network.

[1]  Stefan Saroiu,et al.  A Measurement Study of Peer-to-Peer File Sharing Systems , 2001 .

[2]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[3]  Albert-László Barabási,et al.  Scale‐Free and Hierarchical Structures in Complex Networks , 2003 .

[4]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[5]  Richard J. Lipton,et al.  Random walks, universal traversal sequences, and the complexity of maze problems , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[6]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7]  Walter Willinger,et al.  On unbiased sampling for unstructured peer-to-peer networks , 2009, TNET.

[8]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .

[9]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[10]  Walter Willinger,et al.  Evaluating Sampling Techniques for Large Dynamic Graphs , 2008 .

[11]  Béla Bollobás,et al.  A Probabilistic Proof of an Asymptotic Formula for the Number of Labelled Regular Graphs , 1980, Eur. J. Comb..

[12]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[13]  Daniel Stutzbach,et al.  Characterizing unstructured overlay topologies in modern P2P file-sharing systems , 2005 .

[14]  M. H. Hansen,et al.  On the Theory of Sampling from Finite Populations , 1943 .

[15]  Krishna P. Gummadi,et al.  King: estimating latency between arbitrary internet end hosts , 2002, IMW '02.

[16]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[17]  Daniel Stutzbach,et al.  Understanding churn in peer-to-peer networks , 2006, IMC '06.

[18]  Daniel Stutzbach,et al.  Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems , 2005, IEEE/ACM Transactions on Networking.

[19]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[20]  Daniel Stutzbach,et al.  Characterizing unstructured overlay topologies in modern P2P file-sharing systems , 2008, TNET.

[21]  L. Asz Random Walks on Graphs: a Survey , 2022 .