Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling

Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling.

[1]  Robert G Carlson,et al.  Respondent-driven sampling to recruit MDMA users: a methodological assessment. , 2005, Drug and alcohol dependence.

[2]  Linda M Collins,et al.  Adaptive sampling in research on risk-related behaviors. , 2002, Drug and alcohol dependence.

[3]  Lynne Stokes Introduction to Variance Estimation , 2008 .

[4]  Joan Jeffri,et al.  Finding the beat: Using respondent-driven sampling to study jazz musicians☆ , 2001 .

[5]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[6]  Erik M. Volz,et al.  Probability based estimation theory for respondent driven sampling , 2008 .

[7]  S. Berg Snowball Sampling—I , 2006 .

[8]  Tobi Saidel,et al.  Review of sampling hard-to-reach and hidden populations for HIV surveillance. , 2005, AIDS.

[9]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[10]  Tom A. B. Snijders,et al.  Estimation On the Basis of Snowball Samples: How To Weight? , 1992 .

[11]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[12]  Devon D. Brewer,et al.  Forgetting in the recall-based elicitation of personal and social networks , 2000, Soc. Networks.

[13]  J. Coleman Relational Analysis: The Study of Social Organizations with Survey Methods , 1958 .

[14]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[15]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[16]  Douglas D. Heckathorn,et al.  From Networks to Populations: The Development and Application of Respondent-Driven Sampling Among IDUs and Latino Gay Men , 2005, AIDS and Behavior.

[17]  C. McCarty,et al.  Comparing Two Methods for Estimating Network Size , 2001 .

[18]  K. Murphy,et al.  Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests, Second Ediction , 1998 .

[19]  Ali Haider,et al.  Partner naming and forgetting: Recall of network members , 2007, Soc. Networks.

[20]  W. Conover Statistical Methods for Rates and Proportions , 1974 .

[21]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[22]  Andrew Gelman,et al.  A method for estimating design-based sampling variances for surveys with weighting, poststratification, and , 2003 .