Estimating the Size and Distribution of Networked Populations with Snowball Sampling

A new strategy is introduced for estimating networked population characteristics. Sample selection is based on the one-wave snowball sampling design. A generalized stochastic block model is posited for the population's network topology. Inference is based on a Bayesian data augmentation procedure. This procedure has the advantage over existing methods in that it can be applied to a networked population of unknown size. An application is provided to a study of an empirical population at risk for HIV/AIDS. The results demonstrate that efficient estimates of the size and distribution of the population can be achieved with this novel strategy.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  S. Fienberg,et al.  Classical multilevel and Bayesian approaches to population size estimation using multiple lists , 1999 .

[3]  Ove Frank Social Network Analysis, Estimation and Sampling in , 2009 .

[4]  Alastair Scott,et al.  Survey Design, Symmetry and Posterior Distributions , 1973 .

[5]  D. G. Chapman Some properties of the hypergeometric distribution with applications to zoölogical somple censuses , 1951 .

[6]  N. Cohen,et al.  Field research in conflict environments: Methodological challenges and snowball sampling , 2011 .

[7]  J. Strang,et al.  Understanding reasons for drug use amongst young people: a functional perspective. , 2001, Health education research.

[8]  J. Potterat,et al.  Social networks and infectious disease: the Colorado Springs Study. , 1994, Social science & medicine.

[9]  Bernard W. Silverman,et al.  Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges When There Are Nonoverlapping Lists , 2019, Journal of the American Statistical Association.

[10]  Steve Thompson,et al.  Estimating Population Size With Link-Tracing Sampling , 2012, 1210.2667.

[11]  Richard M. Royall,et al.  An Old Approach to Finite Population Sampling Theory , 1968 .

[12]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[13]  Pedro E. Monjardin,et al.  Link-tracing sampling with an initial sequential sample of sites: Estimating the size of a hidden human population , 2009 .

[14]  Howard Wainer,et al.  For want of a nail: Why unnecessarily long tests may be impeding the progress of Western civilisation , 2015 .

[15]  P. Pattison,et al.  Conditional estimation of exponential random graph models from snowball sampling designs , 2013 .

[16]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[17]  Katherine Browne,et al.  Snowball sampling: using social networks to research non‐heterosexual women , 2005 .

[18]  Forrest W. Crawford,et al.  Hidden Population Size Estimation From Respondent-Driven Sampling: A Network Approach , 2015, Journal of the American Statistical Association.

[19]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[20]  Mark S Handcock,et al.  Estimating the size of populations at high risk for HIV using respondent‐driven sampling data , 2015, Biometrics.

[21]  Adrian E. Raftery,et al.  Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models , 2009, Soc. Networks.

[22]  Steven K. Thompson,et al.  Estimation with link-tracing sampling designs -- A Bayesian approach , 2004 .

[23]  L. Rivest,et al.  Rcapture: Loglinear Models for Capture-Recapture in R , 2007 .

[24]  Garry Robins,et al.  Bayesian analysis for partially observed network data, missing ties, attributes and actors , 2013, Soc. Networks.

[25]  John Scott,et al.  The SAGE Handbook of Social Network Analysis , 2011 .

[26]  Martín H. Felix-Medina,et al.  Combining link-tracing sampling and cluster sampling to estimate the size of hidden populations , 2004 .

[27]  Louis-Paul Rivest,et al.  Improved log‐linear model estimators of abundance in capture‐recapture experiments , 2001 .

[28]  Louis-Paul Rivest,et al.  Loglinear Models for the Robust Design in Mark–Recapture Experiments , 2004, Biometrics.

[29]  Richard Rothenberg,et al.  Choosing a centrality measure: Epidemiologic correlates in the Colorado Springs study of social networks☆ , 1995 .

[30]  Maarten Cruyff,et al.  The Challenge of Counting Victims of Human Trafficking: Not on the record: A multiple systems estimation of the numbers of human trafficking victims in the Netherlands in 2010–2015 by year, age, gender, and type of exploitation , 2017 .

[31]  Ruth King,et al.  Injecting drug users in Scotland, 2006: Listing, number, demography, and opiate-related death-rates , 2012, Addiction research & theory.

[32]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[33]  Mark S Handcock,et al.  Estimating hidden population size using Respondent-Driven Sampling data. , 2012, Electronic journal of statistics.

[34]  Peng Wang,et al.  Modelling a disease-relevant contact network of people who inject drugs , 2013, Soc. Networks.

[35]  Kevin Bales,et al.  Modern slavery in the UK: How many victims? , 2015 .

[36]  Ove Frank Social Network Analysis, Estimation and Sampling in , 2009, Encyclopedia of Complexity and Systems Science.

[37]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[38]  Matthew J. Salganik,et al.  Assessing Network Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence From a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil , 2011, American journal of epidemiology.

[39]  Rebecca D. Petersen,et al.  Using Snowball-Based Methods in Hidden Populations to Generate a Randomized Community Sample of Gang-Affiliated Adolescents , 2005 .

[40]  Richard Rothenberg,et al.  Using Knowledge of Social Networks to Prevent Human Immunodeficiency Virus Infections: The Colorado Springs Study , 1999 .

[41]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[42]  George A. F. Seber,et al.  The Effects of Trap Response on Tag Recapture Estimates , 1970 .

[43]  T. Postelnicu,et al.  Foundations of inference in survey sampling , 1977 .