Probabilistic Cause-of-disease Assignment using Case-control Diagnostic Tests: A Latent Variable Regression Approach

Optimal prevention and treatment strategies for a disease of multiple causes, such as pneumonia, must be informed by the population distribution of causes among cases, or cause-specific case fractions (CSCFs). CSCFs may further depend on additional explanatory variables. Existing methodological literature in disease etiology research does not fully address the regression problem, particularly under a case-control design. Based on multivariate binary non-gold-standard diagnostic data and additional covariate information, this paper proposes a novel and unified regression modeling framework for estimating covariate-dependent CSCF functions in case-control disease etiology studies. The model leverages critical control data for valid probabilistic cause assignment for cases.We derive an efficient Markov chain Monte Carlo algorithm for flexible posterior inference. We illustrate the inference of CSCF functions using extensive simulations and show that the proposed model produces less biased estimates and more valid inference of the overall CSCFs than analyses that omit covariates. A regression analysis of pediatric pneumonia data reveals the dependence of CSCFs upon season, age, HIV status and disease severity. The paper concludes with a brief discussion on model extensions that may further enhance the utility of the regression model in disease etiology research.

[1]  V. L. Schermer,et al.  Book Review: Ruptures in the American Psyche: Containing Destructive Populism in Perilous Times , 2022, Group Analysis.

[2]  Kelly R. Moran,et al.  Bayesian hierarchical factor regression models to infer cause of death from verbal autopsy data , 2019, Journal of the Royal Statistical Society. Series C, Applied statistics.

[3]  S. Madhi,et al.  Causes of severe pneumonia requiring hospital admission in children without HIV infection from Africa and Asia: the PERCH multi-country case-control study , 2019, The Lancet.

[4]  Gongjun Xu,et al.  Learning Attribute Patterns in High-Dimensional Structured Latent Attribute Models , 2019, J. Mach. Learn. Res..

[5]  Abhirup Datta,et al.  Regularized Bayesian transfer learning for population-level etiological distributions , 2018, Biostatistics.

[6]  S. Zeger,et al.  A Bayesian Approach to Restricted Latent Class Models for Scientifically-Structured Clustering of Multivariate Binary Outcomes , 2018, bioRxiv.

[7]  Maureen H Diaz,et al.  Causes and incidence of community-acquired serious infections among young children in south Asia (ANISA): an observational cohort study , 2018, The Lancet.

[8]  O. Ramilo,et al.  Respiratory Syncytial Virus Seasonality: A Global Overview , 2018, The Journal of infectious diseases.

[9]  Gongjun Xu,et al.  Partial identifiability of restricted latent class models , 2018, 1803.04353.

[10]  A. Linero Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection , 2018 .

[11]  K. O'Brien,et al.  Addressing the Analytic Challenges of Cross-Sectional Pediatric Pneumonia Etiology Data , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[12]  S. Madhi,et al.  Bayesian Estimation of Pneumonia Etiology: Epidemiologic Considerations and Applications to the Pneumonia Etiology Research for Child Health Study , 2017, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[13]  Quan Zhang,et al.  Permuted and Augmented Stick-Breaking Bayesian Multinomial Regression , 2016, J. Mach. Learn. Res..

[14]  Scott L Zeger,et al.  Nested partially latent class models for dependent binary data; estimating disease etiology. , 2015, Biostatistics.

[15]  Francesco C Stingo,et al.  Bayesian nonlinear model selection for gene regulatory networks , 2015, Biometrics.

[16]  Scott W. Linderman,et al.  Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation , 2015, NIPS.

[17]  Scott L. Zeger,et al.  Partially latent class models for case–control studies of childhood pneumonia aetiology , 2015, Journal of the Royal Statistical Society. Series C, Applied statistics.

[18]  Samuel J. Clark,et al.  Probabilistic Cause-of-Death Assignment Using Verbal Autopsies , 2014, Journal of the American Statistical Association.

[19]  Bradford D. Gessner,et al.  Use of vaccines as probes to define disease burden , 2014, The Lancet.

[20]  Inacio Mandomando,et al.  Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study , 2013, The Lancet.

[21]  T. Farag,et al.  Statistical Methods in the Global Enteric Multicenter Study (GEMS) , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[22]  Maria Deloria-Knoll,et al.  The Pneumonia Etiology Research for Child Health Project: A 21st Century Childhood Pneumonia Etiology Study , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[23]  Harish Nair,et al.  Global burden of respiratory infections due to seasonal influenza in young children: a systematic review and meta-analysis , 2011, The Lancet.

[24]  Miguel A. Juárez,et al.  Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. , 2011, Biostatistics.

[25]  Roderick Little,et al.  Calibrated Bayes, for Statistics in General, and Missing Data in Particular , 2011, 1108.1917.

[26]  David B Dunson,et al.  Nonparametric Bayesian models through probit stick-breaking processes. , 2011, Bayesian analysis.

[27]  Wesley O Johnson,et al.  Identifiability of Models for Multiple Diagnostic Testing in the Absence of a Gold Standard , 2010, Biometrics.

[28]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[29]  Paul Gustafson,et al.  Bayesian multinomial regression with class-specific predictor selection , 2009, 0901.4208.

[30]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[31]  Thomas A Louis,et al.  Effective communication of standard errors and confidence intervals. , 2008, Biostatistics.

[32]  J. Peiris,et al.  Pneumonia research to reduce childhood mortality in the developing world. , 2008, The Journal of clinical investigation.

[33]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[34]  S. Fienberg,et al.  DESCRIBING DISABILITY THROUGH INDIVIDUAL-LEVEL MIXTURE MODELS FOR MULTIVARIATE BINARY DATA. , 2007, The annals of applied statistics.

[35]  Margaret Sullivan Pepe,et al.  Insights into latent class analysis of diagnostic test performance. , 2007, Biostatistics.

[36]  J. Schwartz,et al.  Semiparametric latent variable regression models for spatiotemporal modelling of mobile source particles in the greater Boston area , 2007 .

[37]  Joseph L Schafer,et al.  Latent class logistic regression: application to marijuana use and attitudes among high school seniors , 2006 .

[38]  Francesco Bartolucci,et al.  A Class of Latent Marginal Models for Capture–Recapture Data With Continuous Covariates , 2006 .

[39]  P. Albert,et al.  A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error without a Gold Standard , 2004, Biometrics.

[40]  S. Lang,et al.  Bayesian P-Splines , 2004 .

[41]  Karen Bandeen-Roche,et al.  Building an identifiable latent class model with covariate effects on underlying and measured variables , 2004 .

[42]  R. Kass,et al.  Bayesian curve-fitting with free-knot splines , 2001 .

[43]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[44]  C. Mantzoros,et al.  Insulin-like growth factor-I in relation to premenopausal ductal carcinoma in situ of the breast. , 1998, Epidemiology.

[45]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[46]  R. Christensen,et al.  A New Perspective on Priors for Generalized Linear Models , 1996 .

[47]  J. Geweke,et al.  Measuring the pricing error of the arbitrage pricing theory , 1996 .

[48]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[49]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[50]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[51]  D P Byar,et al.  Estimating the population attributable risk for multiple risk factors using case-control data. , 1985, American journal of epidemiology.

[52]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[53]  Yik-Chung Wu,et al.  Convergence of Gaussian Belief Propagation Under General Pairwise Factorization: Connecting Gaussian MRF with Pairwise Linear Gaussian Model , 2019, J. Mach. Learn. Res..

[54]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[55]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .