Estimating network properties from snowball sampled data

This article addresses the estimation of topological network parameters from data obtained with a snowball sampling design. An approximate expression for the probability of a vertex to be included in the sample is derived. Based on this sampling distribution, estimators for the mean degree, the degree correlation, and the clustering coefficient are proposed. The performance of these estimators and their sensitivity with respect to the response rate are validated through Monte Carlo simulations on several test networks. Our approach has no complex computational requirements and is straightforward to apply to real-world survey data. In a snowball sample design, each respondent is typically enquired only once. Different from the widely used estimator for Respondent-Driven Sampling (RDS), which assumes sampling with replacement, the proposed approach relies on sampling without replacement and is thus also applicable for large sample fractions. From the simulation experiments, we conclude that the estimation quality decreases with increasing variance of the network degree distribution. Yet, if the degree distribution is not to broad, our approach results in good estimates for the mean degree and the clustering coefficient, which, moreover, are almost independent from the response rate. The estimates for the degree correlation are of moderated quality.

[1]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[2]  Hawoong Jeong,et al.  Statistical properties of sampled networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[5]  R. Jayanth,et al.  たんぱく質の幾何:水素結合,立体構造および周辺コンパクトチューブ , 2006 .

[6]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[7]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[8]  Ove Frank,et al.  CHAPTER 16 – ESTIMATION OF POPULATION TOTALS BY USE OF SNOWBALL SAMPLES , 1979 .

[9]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[10]  Krista Gile Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation , 2010, 1006.4837.

[11]  S. Berg Snowball Sampling—I , 2006 .

[12]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[13]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[14]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  K. Nagel,et al.  Insights into a spatially embedded social network from a large-scale snowball sample , 2011 .

[16]  Mike Kwanisai,et al.  Estimation in Network Populations , 2006 .

[17]  Steven K Thompson,et al.  Adaptive Web Sampling , 2006, Biometrics.

[18]  Tom A. B. Snijders,et al.  Estimation On the Basis of Snowball Samples: How To Weight? , 1992 .

[19]  Erik M. Volz,et al.  Probability based estimation theory for respondent driven sampling , 2008 .

[20]  James S. Boster,et al.  Estimating relational attributes from snowball samples through simulation , 1989 .

[21]  Steven K. Thompson,et al.  Estimation with link-tracing sampling designs -- A Bayesian approach , 2004 .

[22]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[23]  Bin Wu,et al.  Distance Distribution and Average Shortest Path Length Estimation in Real-World Networks , 2010, ADMA.

[24]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[25]  Kay W. Axhausen,et al.  Collecting data on leisure travel , 2010 .

[26]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[27]  Cyprian Wejnert,et al.  Social network analysis with respondent-driven sampling data: A study of racial integration on campus , 2010, Soc. Networks.

[28]  R. Atkinson,et al.  Accessing Hidden and Hard-to-Reach Populations: Snowball Research Strategies , 2001 .

[29]  Zan Huang,et al.  Sampling Large-scale Social Networks: Insights from Simulated Networks , 2008 .

[30]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[31]  Mark S Handcock,et al.  7. Respondent-Driven Sampling: An Assessment of Current Methodology , 2009, Sociological methodology.

[32]  西田 昌平 Radiative B meson decays into Kπγ and Kππγ final states , 2003 .

[33]  Minas Gjoka,et al.  Walking on a graph with a magnifying glass: stratified sampling via weighted random walks , 2011, PERV.

[34]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.