A Simultaneous Equation Approach to Estimating HIV Prevalence With Nonignorable Missing Responses

ABSTRACT Estimates of HIV prevalence are important for policy to establish the health status of a country’s population and to evaluate the effectiveness of population-based interventions and campaigns. However, participation rates in testing for surveillance conducted as part of household surveys, on which many of these estimates are based, can be low. HIV positive individuals may be less likely to participate because they fear disclosure, in which case estimates obtained using conventional approaches to deal with missing data, such as imputation-based methods, will be biased. We develop a Heckman-type simultaneous equation approach that accounts for nonignorable selection, but unlike previous implementations, allows for spatial dependence and does not impose a homogenous selection process on all respondents. In addition, our framework addresses the issue of separation, where for instance some factors are severely unbalanced and highly predictive of the response, which would ordinarily prevent model convergence. Estimation is carried out within a penalized likelihood framework where smoothing is achieved using a parameterization of the smoothing criterion, which makes estimation more stable and efficient. We provide the software for straightforward implementation of the proposed approach, and apply our methodology to estimating national and sub-national HIV prevalence in Swaziland, Zimbabwe, and Zambia. Supplementary materials for this article are available online.

[1]  D. Canning,et al.  Using interviewer random effects to remove selection bias from HIV prevalence estimates , 2015, BMC Medical Research Methodology.

[2]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[3]  H. James VARIETIES OF SELECTION BIAS , 1990 .

[4]  Rosalba Radice,et al.  Bivariate copula additive models for location, scale and shape , 2016, Comput. Stat. Data Anal..

[5]  S. Wood Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models , 2004 .

[6]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[7]  J Ties Boerma,et al.  Estimates of HIV-1 prevalence from national population-based surveys as a new gold standard , 2003, The Lancet.

[8]  P. Ghys,et al.  The epidemiology of HIV infection among young people aged 15–24 years in southern Africa , 2008, AIDS.

[9]  N. French,et al.  Underestimation of HIV prevalence in surveys when some people already know their status, and ways to reduce the bias , 2013, AIDS.

[10]  J. Heckman Sample selection bias as a specification error , 1979 .

[11]  N. McGrath,et al.  Individual, household and community factors associated with HIV test refusal in rural Malawi , 2008, Tropical medicine & international health : TM & IH.

[12]  S. Clark,et al.  Validation, Replication, and Sensitivity Testing of Heckman-Type Selection Models to Adjust Estimates of HIV Prevalence , 2014, PloS one.

[13]  David Canning,et al.  Correcting HIV Prevalence Estimates for Survey Nonparticipation Using Heckman-type Selection Models , 2011, Epidemiology.

[14]  Rosalba Radice,et al.  Copula regression spline models for binary outcomes , 2015, Statistics and Computing.

[15]  A. Fletcher,et al.  Systematic review exploring time trends in the association between educational attainment and risk of HIV infection in sub-Saharan Africa , 2008, AIDS.

[16]  W. V. D. Ven,et al.  The demand for deductibles in private health insurance: A probit model with sample selection , 1981 .

[17]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[18]  Rosalba Radice,et al.  On the Assumption of Bivariate Normality in Selection Models: A Copula Approach Applied to Estimating HIV Prevalence , 2015, Epidemiology.

[19]  Stanley R. Johnson,et al.  Varying Coefficient Models , 1984 .

[20]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[21]  S. Gillespie,et al.  Investigating the empirical evidence for understanding vulnerability and the associations between poverty, HIV infection and AIDS impact. , 2007, AIDS.

[22]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[23]  Franco Peracchi,et al.  Using panel data for partial identification of human immunodeficiency virus prevalence when infection status is missing not at random , 2014 .

[24]  Delhi Paiva,et al.  Copula-based regression models: A survey , 2009 .

[25]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[26]  Pravin K. Trivedi,et al.  Using Trivariate Copulas to Model Sample Selection and Treatment Effects , 2006 .

[27]  Magorzata Wojtyś,et al.  Copula regression spline sample selection models:the R Package SemiParSampleSel , 2016 .

[28]  Jeffrey A. Dubin,et al.  Selection Bias in Linear Regression, Logit and Probit Models , 1989 .

[29]  F. Obare Nonresponse in repeat population-based voluntary counseling and testing for HIV in rural Malawi , 2010, Demography.

[30]  S. Wood Thin plate regression splines , 2003 .

[31]  J. R. RINGROSE,et al.  Solution of a Problem , 1874 .

[32]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[33]  Gary Chamberlain,et al.  Analysis of Covariance with Qualitative Data , 1979 .

[34]  David Madden,et al.  Sample selection versus two-part models revisited: the case of female smoking and drinking. , 2008, Journal of health economics.

[35]  Patrick A. Puhani,et al.  The Heckman Correction for Sample Selection and Its Critique - A Short Survey , 2000 .

[36]  T. Bärnighausen,et al.  HIV status and participation in HIV surveillance in the era of antiretroviral treatment: a study of linked population-based and clinical data in rural South Africa , 2012, Tropical medicine & international health : TM & IH.

[37]  G. Marra,et al.  Copula based generalized additive models with non-random sample selection , 2015, 1508.04070.

[38]  Yoonjoung Choi,et al.  A systematic review of Demographic and Health Surveys: data availability and utilization for research. , 2012, Bulletin of the World Health Organization.

[39]  K. D. de Cock,et al.  Unfinished business--expanding HIV testing in developing countries. , 2006, The New England journal of medicine.

[40]  Eva Navas,et al.  Accepted Manuscript , 2022 .

[41]  A. Cross,et al.  HIV testing in national population-based surveys: experience from the Demographic and Health Surveys. , 2006, Bulletin of the World Health Organization.

[42]  Abe Sklar,et al.  Random variables, joint distribution functions, and copulas , 1973, Kybernetika.

[43]  F. Vella Estimating Models with Sample Selection Bias: A Survey , 1998 .

[44]  J. Larmarange,et al.  HIV estimates at second subnational level from national population-based surveys , 2014, AIDS.

[45]  V. Mishra,et al.  Evaluation of bias in HIV seroprevalence estimates from national household surveys , 2008, Sexually Transmitted Infections.

[46]  M. E. Rochina,et al.  Selection Correction in Panel Data Models: An Application to the Estimation of Females' Wage Equations , 2007 .

[47]  H. Künsch Gaussian Markov random fields , 1979 .

[48]  C. Beyrer,et al.  Expanding the Space: Inclusion of Most-at-Risk Populations in HIV Prevention, Treatment, and Care Services , 2011, Journal of acquired immune deficiency syndromes.

[49]  T. Bärnighausen,et al.  Localized spatial clustering of HIV infections in a widely disseminated rural South African epidemic. , 2009, International journal of epidemiology.

[50]  Rosalba Radice,et al.  A penalized likelihood estimation approach to semiparametric sample selection binary response modeling , 2013 .

[51]  M. Kenward Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. , 1998, Statistics in medicine.

[52]  T. F. Rinke de Wit,et al.  Refusal Bias in the Estimation of HIV Prevalence , 2014, Demography.

[53]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[54]  Cheti Nicoletti,et al.  Nonresponse in dynamic panel data models , 2006 .

[55]  K. Holmes,et al.  Advances in multilevel approaches to understanding the epidemiology and prevention of sexually transmitted infections and HIV: an overview. , 2005, The Journal of infectious diseases.

[56]  D. Ruppert,et al.  Flexible Copula Density Estimation with Penalized Hierarchical B‐splines , 2013 .

[57]  John Gilbey,et al.  Unfinished business , 2010, Nature.

[58]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[59]  S. Wood,et al.  Coverage Properties of Confidence Intervals for Generalized Additive Model Components , 2012 .

[60]  Simon N. Wood,et al.  A simple test for random effects in regression models , 2013 .

[61]  Rosalba Radice,et al.  Copula based generalized additive models for location, scale and shape with non-random sample selection , 2018, Comput. Stat. Data Anal..

[62]  J. Segers,et al.  Semiparametric Gaussian Copula Models: Geometry and Efficient Rank-Based Estimation , 2013, 1306.6658.

[63]  M. Marston,et al.  Non-response bias in estimates of HIV prevalence due to the mobility of absentees in national population-based surveys: a study of nine national surveys , 2008, Sexually Transmitted Infections.

[64]  J. S. Butler Estimating the Correlation in Censored Probit Models , 1996 .

[65]  Yunmin Zhu,et al.  Linear B-spline copulas with applications to nonparametric estimation of copulas , 2008, Comput. Stat. Data Anal..

[66]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[67]  A. Klovdahl,et al.  Social networks and the spread of infectious diseases: the AIDS example. , 1985, Social science & medicine.

[68]  D. Canning,et al.  Interviewer identity as exclusion restriction in epidemiology. , 2011, Epidemiology.

[69]  Y. Mundlak On the Pooling of Time Series and Cross Section Data , 1978 .

[70]  David Canning,et al.  National HIV prevalence estimates for sub-Saharan Africa: controlling selection bias with Heckman-type selection models , 2012, Sexually Transmitted Infections.

[71]  J. Eaton,et al.  Refusal bias in HIV prevalence estimates from nationally representative seroprevalence surveys , 2009, AIDS.

[72]  S. Wood On p-values for smooth components of an extended generalized additive model , 2013 .

[73]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[74]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[75]  S V Subramanian,et al.  Demographic and health surveys: a profile. , 2012, International journal of epidemiology.