Estimating Population Size Using the Network Scale Up Method.

We develop methods for estimating the size of hard-to-reach populations from data collected using network-based questions on standard surveys. Such data arise by asking respondents how many people they know in a specific group (e.g. people named Michael, intravenous drug users). The Network Scale up Method (NSUM) is a tool for producing population size estimates using these indirect measures of respondents' networks. Killworth et al. (1998a,b) proposed maximum likelihood estimators of population size for a fixed effects model in which respondents' degrees or personal network sizes are treated as fixed. We extend this by treating personal network sizes as random effects, yielding principled statements of uncertainty. This allows us to generalize the model to account for variation in people's propensity to know people in particular subgroups (barrier effects), such as their tendency to know people like themselves, as well as their lack of awareness of or reluctance to acknowledge their contacts' group memberships (transmission bias). NSUM estimates also suffer from recall bias, in which respondents tend to underestimate the number of members of larger groups that they know, and conversely for smaller groups. We propose a data-driven adjustment method to deal with this. Our methods perform well in simulation studies, generating improved estimates and calibrated uncertainty intervals, as well as in back estimates of real sample data. We apply them to data from a study of HIV/AIDS prevalence in Curitiba, Brazil. Our results show that when transmission bias is present, external information about its likely extent can greatly improve the estimates. The methods are implemented in the NSUM R package.

[1]  Matthew J. Salganik,et al.  Assessing Network Scale-up Estimates for Groups Most at Risk of HIV/AIDS: Evidence From a Multiple-Method Study of Heavy Drug Users in Curitiba, Brazil , 2011, American journal of epidemiology.

[2]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[3]  Perry de Valpine,et al.  BETTER INFERENCES FROM POPULATION-DYNAMICS EXPERIMENTS USING MONTE CARLO STATE-SPACE LIKELIHOOD METHODS , 2003 .

[4]  H. Russell Bernard,et al.  Scale-Up Methods as Applied to Estimates of Heroin use , 2006 .

[5]  Tapabrata Maiti,et al.  Analysis of Longitudinal Data (2nd ed.) (Book) , 2004 .

[6]  Soichi Koike,et al.  Population Size Estimation of Men Who Have Sex with Men through the Network Scale-Up Method in Japan , 2012, PloS one.

[7]  H. Jeffreys,et al.  Theory of probability , 1896 .

[8]  Paul W. Mielke Convenient Beta Distribution Likelihood Techniques for Describing and Comparing Meteorological Data , 1975 .

[9]  Matthew J. Salganik,et al.  How Many People Do You Know?: Efficiently Estimating Personal Network Size , 2010, Journal of the American Statistical Association.

[10]  H. Russell Bernard,et al.  Estimation of Seroprevalence, Rape, and Homelessness in the United States Using a Social Network Approach , 1998, Evaluation review.

[11]  H. Russell Bernard,et al.  A social network approach to estimating seroprevalence in the United States , 1998 .

[12]  Tyler H McCormick,et al.  LATENT DEMOGRAPHIC PROFILE ESTIMATION IN HARD-TO-REACH GROUPS. , 2012, The annals of applied statistics.

[13]  C. McCarty,et al.  Comparing Two Methods for Estimating Network Size , 2001 .

[14]  Brian D. Ripley,et al.  Regression techniques for the detection of analytical bias , 1987 .

[15]  H. Russell Bernard,et al.  Investigating the Variation of Personal Network Size Under Unknown Error Conditions , 2006 .

[16]  Matthew J. Salganik,et al.  The game of contacts: Estimating the social visibility of groups , 2011, Soc. Networks.

[17]  Tian Zheng,et al.  How Many People Do You Know in Prison? , 2006 .

[18]  Tian Zheng,et al.  Adjusting for Recall Bias in “ How Many X ’ s Do You Know ? ” Surveys , 2007 .

[19]  J. G. Skellam A Probability Distribution Derived from the Binomial Distribution by Regarding the Probability of Success as Variable between the Sets of Trials , 1948 .

[20]  Adrian E. Raftery,et al.  Inference for the binomial N parameter: A hierarchical Bayes approach , 1988 .

[21]  P. Diggle,et al.  Analysis of Longitudinal Data , 2003 .

[22]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[23]  H. Russell Bernard,et al.  Two interpretations of reports of knowledge of subpopulation sizes , 2003, Soc. Networks.

[24]  H. Russell Bernard,et al.  Estimating the size of an average personal network and of an event subpopulation: Some empirical results☆ , 1991 .