New estimates for network sampling

Network sampling is used around the world for surveys of vulnerable, hard-to-reach populations including people at risk for HIV, opioid misuse, and emerging epidemics. The sampling methods include tracing social links to add new people to the sample. Current estimates from these surveys are inaccurate, with large biases and mean squared errors and unreliable confidence intervals. New estimators are introduced here which eliminate almost all of the bias, have much lower mean squared error, and enable confidence intervals with good properties. The improvement is attained by avoiding unrealistic assumptions about the population network and the design, instead using the topology of the sample network data together with the sampling design actually used. In simulations using the real network of an at-risk population, the new estimates eliminate almost all the bias and have mean squared-errors that are 2 to 92 times lower than those of current estimators. The new estimators are effective with a wide variety of network designs including those with strongly restricted branching such as Respondent-Driven Sampling and freely branching designs such as Snowball Sampling.

[1]  K. R. W. Brewer,et al.  RATIO ESTIMATION AND FINITE POPULATIONS: SOME RESULTS DEDUCIBLE FROM THE ASSUMPTION OF AN UNDERLYING STOCHASTIC PROCESS , 1963 .

[2]  Arnaud Legout,et al.  Sampling online social networks , 2014, SIGCOMM.

[3]  T. Valente Network Interventions , 2012, Science.

[4]  Tyler H McCormick,et al.  Estimating uncertainty in respondent-driven sampling using a tree bootstrap method , 2016, Proceedings of the National Academy of Sciences.

[5]  Linda M Collins,et al.  Adaptive sampling in research on risk-related behaviors. , 2002, Drug and alcohol dependence.

[6]  Peter J. Mucha,et al.  Network Structure and Biased Variance Estimation in Respondent Driven Sampling , 2013, PloS one.

[7]  Z W Birnbaum,et al.  Design of sample surveys to estimate the prevalence of rare diseases: three unbiased estimates. , 1965, Vital and health statistics. Series 2, Data evaluation and methods research.

[8]  S. Thompson,et al.  Simple estimators for network sampling , 2018, 1804.00808.

[9]  Kim Kyu-Seong,et al.  Design-based and model-based Inferences in Survey Sampling , 2005 .

[10]  D. Heckathorn 6. Extensions of Respondent-Driven Sampling: Analyzing Continuous Variables and Controlling for Differential Recruitment , 2007 .

[11]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[12]  Mark S Handcock,et al.  7. Respondent-Driven Sampling: An Assessment of Current Methodology , 2009, Sociological methodology.

[13]  Michael W. Spiller,et al.  HIV Infection Linked to Injection Use of Oxymorphone in Indiana, 2014-2015. , 2016, The New England journal of medicine.

[14]  Mark S Handcock,et al.  Evaluating Variance Estimators for Respondent-Driven Sampling. , 2018, Journal of survey statistics and methodology.

[15]  T. Wassmer 6 , 1900, EXILE.

[16]  Steven K Thompson,et al.  Adaptive Web Sampling , 2006, Biometrics.

[17]  Debra Hanson,et al.  Detailed Transmission Network Analysis of a Large Opiate-Driven Outbreak of HIV Infection in the United States , 2017, The Journal of infectious diseases.

[18]  Mark S. Handcock,et al.  Modeling concurrency and selective mixing in heterosexual partnership networks with applications to sexually transmitted diseases , 2016 .

[19]  Forrest W. Crawford A recruitment model and population size estimation for respondent-driven sampling , 2014 .

[20]  Steven K. Thompson,et al.  Adaptive and Network Sampling for Inference and Interventions in Changing Populations , 2017 .

[21]  Erik M. Volz,et al.  Probability based estimation theory for respondent driven sampling , 2008 .

[22]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[23]  Steve Thompson,et al.  Design-adherent estimators for network surveys , 2019, 1909.05018.

[24]  Ove Frank,et al.  Survey sampling in graphs , 1977 .

[25]  Matthew J. Salganik,et al.  Strengthening the Reporting of Observational Studies in Epidemiology for respondent-driven sampling studies: “STROBE-RDS” statement , 2015, Journal of clinical epidemiology.

[26]  Matthew J. Salganik Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling , 2006, Journal of Urban Health.

[27]  Nick Koudas,et al.  Sampling Online Social Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[28]  Matthew J. Salganik,et al.  Assessing respondent-driven sampling , 2010, Proceedings of the National Academy of Sciences.

[29]  Abby E. Rudolph,et al.  Network-Based Research on Rural Opioid Use: an Overview of Methods and Lessons Learned , 2018, Current HIV/AIDS Reports.

[30]  April M Young,et al.  Spatial, temporal and relational patterns in respondent-driven sampling: evidence from a social network study of rural drug users , 2014, Journal of Epidemiology & Community Health.

[31]  Alden S. Klovdahl,et al.  Mapping a social network of heterosexuals at high risk for HIV infection , 1994, AIDS.

[32]  Krista Gile Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation , 2010, 1006.4837.

[33]  M Kretzschmar,et al.  Measures of concurrency in networks and the spread of infectious disease. , 1996, Mathematical biosciences.

[34]  Forrest W. Crawford,et al.  Hidden Population Size Estimation From Respondent-Driven Sampling: A Network Approach , 2015, Journal of the American Statistical Association.

[35]  M. Spreen Rare Populations, Hidden Populations, and Link-Tracing Designs: What and Why? , 1992 .

[36]  Xuliang Zhao,et al.  7. , 2020, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[37]  M. Kretzschmar,et al.  Concurrent partnerships and the spread of HIV , 1997, AIDS.

[38]  Muhammad Hanif,et al.  Sampling With Unequal Probabilities , 1982 .

[39]  R. Rothenberg,et al.  Network structural dynamics and infectious disease propagation , 1999, International journal of STD & AIDS.

[40]  Mark S Handcock,et al.  Estimating hidden population size using Respondent-Driven Sampling data. , 2012, Electronic journal of statistics.

[41]  J. Havens,et al.  Network Structure and the Risk for HIV Transmission Among Rural Drug Users , 2013, AIDS and Behavior.

[42]  Ian E. Fellows,et al.  Respondent‐driven sampling and the homophily configuration graph , 2018, Statistics in medicine.