Bayesian Nested Latent Class Models for Cause-of-Death Assignment using Verbal Autopsies Across Multiple Domains

Understanding cause-specific mortality rates is crucial for monitoring population health and designing public health interventions. Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. It is routinely implemented in many lowand middle-income countries. Statistical algorithms to assign cause of death using VAs are typically vulnerable to the distribution shift between the data used to train the model and the target population. This presents a major challenge for analyzing VAs as labeled data are usually unavailable in the target population. This article proposes a Latent Class model framework for VA data (LCVA) that jointly models VAs collected over multiple heterogeneous domains, assign cause of death for out-of-domain observations, and estimate cause-specific mortality fractions for a new domain. We introduce a parsimonious representation of the joint distribution of the collected symptoms using nested latent class models and develop an efficient algorithm for posterior inference. We demonstrate that LCVA outperforms existing methods in predictive performance and scalability. Supplementary materials for this article and the R package to implement the model are available online.

[1]  Rajendra Prasad,et al.  Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets , 2011, Population health metrics.

[2]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[3]  Rafael Lozano,et al.  Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies , 2011, Population health metrics.

[4]  Alan D. Lopez,et al.  Performance of physician-certified verbal autopsies: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[5]  Egoitz Laparra,et al.  Rethinking domain adaptation for machine learning over clinical language , 2020, JAMIA open.

[6]  Peter Byass,et al.  A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam , 2003, Scandinavian journal of public health. Supplement.

[7]  Rumi Chunara,et al.  Population-aware hierarchical bayesian domain adaptation via multi-component invariant learning , 2019, CHIL.

[8]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[9]  Barbara Plank,et al.  Neural Unsupervised Domain Adaptation in NLP—A Survey , 2020, COLING.

[10]  Peter Byass,et al.  The INDEPTH Network: filling vital gaps in global epidemiology , 2012, International journal of epidemiology.

[11]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[12]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[13]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[14]  Rumi Chunara,et al.  Domain Adaptation for Infection Prediction from Symptoms Based on Data from Different Study Designs and Contexts , 2018, ArXiv.

[15]  Yang Liu,et al.  Understanding Instance-Level Label Noise: Disparate Impacts and Treatments , 2021, ICML.

[16]  Alan D. Lopez,et al.  Improving performance of the Tariff Method for assigning causes of death to verbal autopsies , 2015, BMC Medicine.

[17]  Alexander Y. Shestopaloff,et al.  Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths , 2015, BMC Medicine.

[18]  E. Stuart,et al.  Transportability of Outcome Measurement Error Correction: from Validation Studies to Intervention Trials , 2019, 1907.10722.

[19]  Arantza Casillas,et al.  Extracting Cause of Death From Verbal Autopsy With Deep Learning Interpretable Methods , 2020, IEEE Journal of Biomedical and Health Informatics.

[20]  Sean T. Green,et al.  Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[21]  Ryan P. Adams,et al.  Archipelago: nonparametric Bayesian semi-supervised learning , 2009, ICML '09.

[22]  Samuel J. Clark,et al.  Probabilistic Cause-of-Death Assignment Using Verbal Autopsies , 2014, Journal of the American Statistical Association.

[23]  Nitesh V. Chawla,et al.  A Review on Quantification Learning , 2017, ACM Comput. Surv..

[24]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[25]  Samuel J. Clark,et al.  Quantifying the Contributions of Training Data and Algorithm Logic to the Performance of Automated Cause-assignment Algorithms for Verbal Autopsy. , 2018, 1803.07141.

[26]  Christine Choirat,et al.  CAUSAL INFERENCE IN THE CONTEXT OF AN ERROR PRONE EXPOSURE: AIR POLLUTION AND MORTALITY. , 2017, The annals of applied statistics.

[27]  Abhirup Datta,et al.  Generalized Bayes Quantification Learning under Dataset Shift , 2021 .

[28]  Lorenzo Trippa,et al.  Multi‐study factor analysis , 2016, Biometrics.

[29]  Kelly R. Moran,et al.  Bayesian hierarchical factor regression models to infer cause of death from verbal autopsy data , 2019, Journal of the Royal Statistical Society. Series C, Applied statistics.

[30]  B Zaba,et al.  Translating global health research aims into action: the example of the ALPHA network * , 2010, Tropical medicine & international health : TM & IH.

[31]  Zhenke Wu,et al.  Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy , 2021, medRxiv.

[32]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[33]  S. Madhi,et al.  Postmortem investigations and identification of multiple causes of child deaths: An analysis of findings from the Child Health and Mortality Prevention Surveillance (CHAMPS) network , 2021, PLoS medicine.

[34]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[35]  Agbessi Amouzou,et al.  Improving birth and death data for African decision making. , 2020, The Lancet. Global health.

[36]  Samuel J Clark,et al.  Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. , 2017, Bayesian analysis.

[37]  David Dunson,et al.  Bayesian Factorizations of Big Sparse Tensors , 2013, Journal of the American Statistical Association.

[38]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[39]  Abhirup Datta,et al.  Regularized Bayesian transfer learning for population-level etiological distributions , 2018, Biostatistics.