Modeling and Analysing Respondent Driven Sampling as a Counting Process

Respondent-driven sampling (RDS) is an approach to sampling design and analysis which utilizes the networks of social relationships that connect members of the target population, using chain-referral. RDS sampling will typically oversample participants with many acquaintances. Naïve estimators, such as the sample average, will thus be biased towards the state of the most highly connected individuals. Current methodology cannot estimate population size from RDS, and promotes inverse probability weighted estimators for population parameters such as HIV prevalence. We propose to use the timing of recruitment, typically collected and discarded, in order to estimate the population size via a counting process model. Once population size and degree frequencies are made available, prevalence can be debiased in a post-stratified framework. We adapt methods developed for inference in epidemiology and software reliability to estimate the population size, degree counts and frequencies. A fundamental advantage of our approach is that it makes the assumptions of the sampling design explicit. This enables verification of the assumptions, maximum likelihood estimation, extension with covariates, and model selection. We develop large-sample theory, proving consistency and asymptotic normality. We further compare our estimators to other estimators in the RDS literature, through simulation and real-world data. In both cases, we find our estimators to outperform current methods. The likelihood problem in the model we present is separable, and thus efficiently solvable. We implement these estimators in an accompanying R package, chords, available on CRAN.

[1]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[2]  H. Andersson,et al.  Stochastic Epidemic Models and Their Statistical Analysis , 2000 .

[3]  Douglas D. Heckathorn,et al.  Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hi , 2002 .

[4]  J. Klein,et al.  Statistical Models Based On Counting Process , 1994 .

[5]  P. Bickel,et al.  NONPARAMETRIC INFERENCE UNDER BIASED SAMPLING FROM A FINITE POPULATION , 1992 .

[6]  L. Gordon Estimation for Large Successive Samples with Unknown Inclusion Probabilities , 1993 .

[7]  Ray Bert Business at the Speed of Now: Fire Up Your People, Thrill Your Customers, and Crush Your Competitors By John M. Bernard. Hoboken, New Jersey: John Wiley & Sons, 2012. , 2012 .

[8]  Thomas A. Severini,et al.  Inference for Exponential Order Statistic Models Based on an Integrated Likelihood Function , 2000 .

[9]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[10]  Mohsen Malekinejad,et al.  Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance in International Settings: A Systematic Review , 2008, AIDS and Behavior.

[11]  Janina Muller,et al.  Analysis Of Infectious Disease Data , 2016 .

[12]  Steven K. Thompson,et al.  Sampling: Thompson/Sampling 3E , 2012 .

[13]  T. Britton,et al.  Estimation in multitype epidemics , 1998 .

[14]  W. Rida,et al.  Asymptotic Properties of Some Estimators for the Infection Rate in the General Stochastic Epidemic Model , 1991 .

[15]  O. Aalen Nonparametric Inference for a Family of Counting Processes , 1978 .

[16]  John D. Musa,et al.  Estimating the total number of software failures using an exponential model , 1991, SOEN.

[17]  Douglas D. Heckathorn,et al.  Respondent-driven sampling : A new approach to the study of hidden populations , 1997 .

[18]  Krista Gile Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation , 2010, 1006.4837.

[19]  Matthew J. Salganik,et al.  Commentary: Respondent-driven Sampling in the Real World. , 2012, Epidemiology.

[20]  M. V. Pul,et al.  Asymptotic properties of a class of statistical models in software reliability , 1992 .

[21]  Z. Jelinski,et al.  Software reliability Research , 1972, Statistical Computer Performance Evaluation.

[22]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[23]  Cyprian Wejnert,et al.  3. An Empirical Test of Respondent-Driven Sampling: Point Estimates, Variance, Degree Measures, and Out-of-Equilibrium Data , 2009, Sociological methodology.

[25]  Forrest W. Crawford,et al.  Hidden Population Size Estimation From Respondent-Driven Sampling: A Network Approach , 2015, Journal of the American Statistical Association.