7. Respondent-Driven Sampling: An Assessment of Current Methodology

Respondent-driven sampling (RDS) employs a variant of a link-tracing network sampling strategy to collect data from hard-to-reach populations. By tracing the links in the underlying social network, the process exploits the social structure to expand the sample and reduce its dependence on the initial (convenience) sample. The current estimators of population averages make strong assumptions in order to treat the data as a probability sample. We evaluate three critical sensitivities of the estimators: (1) to bias induced by the initial sample, (2) to uncontrollable features of respondent behavior, and (3) to the without-replacement structure of sampling. Our analysis indicates: (1) that the convenience sample of seeds can induce bias, and the number of sample waves typically used in RDS is likely insufficient for the type of nodal mixing required to obtain the reputed asymptotic unbiasedness; (2) that preferential referral behavior by respondents leads to bias; (3) that when a substantial fraction of the target population is sampled the current estimators can have substantial bias. This paper sounds a cautionary note for the users of RDS. While current RDS methodology is powerful and clever, the favorable statistical properties claimed for the current estimates are shown to be heavily dependent on often unrealistic assumptions. We recommend ways to improve the methodology.

[1]  Erik M. Volz,et al.  Probability based estimation theory for respondent driven sampling , 2008 .

[2]  Matthew J. Salganik,et al.  5. Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling , 2004 .

[3]  Matthew J. Salganik Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling , 2006, Journal of Urban Health.

[4]  S. Berg Snowball Sampling—I , 2006 .

[5]  Joan Jeffri,et al.  Finding the beat: Using respondent-driven sampling to study jazz musicians☆ , 2001 .

[6]  Annette Bernhardt,et al.  Documenting Unregulated Work: A Survey of Workplace Violations in Chicago, Los Angeles and New York City , 2011 .

[7]  Michael Agar,et al.  Targeted Sampling in Drug Abuse Research: A Review and Case Study , 2008 .

[8]  Rebeca Ramos,et al.  Respondent-Driven Sampling of Injection Drug Users in Two U.S.–Mexico Border Cities: Recruitment Dynamics and Impact on Estimates of HIV and Syphilis Prevalence , 2006, Journal of Urban Health.

[9]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[10]  Tobi Saidel,et al.  Baseline integrated behavioural and biological assessment among most at-risk populations in six high-prevalence states of India: design and implementation challenges , 2008, AIDS.

[11]  Mark S Handcock,et al.  MODELING SOCIAL NETWORKS FROM SAMPLED DATA. , 2010, The annals of applied statistics.

[12]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[13]  Douglas D. Heckathorn,et al.  Effectiveness of Respondent–Driven Sampling for Recruiting Drug Users in New York City: Findings from a Pilot Study , 2007, Journal of Urban Health.

[14]  Persi Diaconis,et al.  The Markov chain Monte Carlo revolution , 2008 .

[15]  Mohsen Malekinejad,et al.  Implementation Challenges to Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance: Field Experiences in International Settings , 2008, AIDS and Behavior.

[16]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[17]  J. F. C. Kingman,et al.  Information and Exponential Families in Statistical Theory , 1980 .

[18]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[19]  Mohsen Malekinejad,et al.  Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review , 2008, AIDS and Behavior.

[20]  Violeta Andjelkovic,et al.  Exploring Barriers to ‘Respondent Driven Sampling’ in Sex Worker and Drug-Injecting Sex Worker Populations in Eastern Europe , 2006, Journal of Urban Health.

[21]  P. Biernacki,et al.  TARGETED SAMPLING: OPTIONS FOR THE STUDY OF HIDDEN POPULATIONS , 1989 .

[22]  Lillian S. Lin,et al.  A Venue-Based Method for Sampling Hard-to-Reach Populations , 2001, Public health reports.

[23]  M. H. Hansen,et al.  On the Theory of Sampling from Finite Populations , 1943 .

[24]  D. Heckathorn 6. Extensions of Respondent-Driven Sampling: Analyzing Continuous Variables and Controlling for Differential Recruitment , 2007 .

[25]  Matthew J. Salganik,et al.  Respondent‐driven sampling as Markov chain Monte Carlo , 2009, Statistics in medicine.

[26]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[27]  Cyprian Wejnert,et al.  Web-Based Network Sampling , 2008 .