Tree-informed Bayesian multi-source domain adaptation: cross-population probabilistic cause-of-death assignment using verbal autopsy

Determining causes of deaths (COD) occurred outside of civil registration and vital statistics systems is challenging. A technique called verbal autopsy (VA) is widely adopted to gather information on deaths in practice. A VA consists of interviewing relatives of a deceased person about symptoms of the deceased in the period leading to the death, often resulting in multivariate binary responses. While statistical methods have been devised for estimating the cause-specific mortality fractions (CSMFs) for a study population, continued expansion of VA to new populations (or "domains") necessitates approaches that recognize between-domain differences while capitalizing on potential similarities. In this paper, we propose such a domain-adaptive method that integrates external between-domain similarity information encoded by a pre-specified rooted weighted tree. Given a cause, we use latent class models to characterize the conditional distributions of the responses that may vary by domain. We specify a logistic stick-breaking Gaussian diffusion process prior along the tree for class mixing weights with node-specific spike-and-slab priors to pool information between the domains in a data-driven way. Posterior inference is conducted via a scalable variational Bayes algorithm. Simulation studies show that the domain adaptation enabled by the proposed method improves CSMF estimation and individual COD assignment. We also illustrate and evaluate the method using a validation data set. The paper concludes with a discussion on limitations and future directions.

[1]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Justin Grimmer An Introduction to Bayesian Inference via Variational Approximations , 2011, Political Analysis.

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  R. Tüchler Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling , 2008 .

[6]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[7]  A. Olshan,et al.  Robust Clustering With Subpopulation-Specific Deviations , 2017, Journal of the American Statistical Association.

[8]  David B. Dunson,et al.  Dimension-Grouped Mixed Membership Models for Multivariate Categorical Data , 2021 .

[9]  Kelly R. Moran,et al.  Bayesian hierarchical factor regression models to infer cause of death from verbal autopsy data , 2019, Journal of the Royal Statistical Society. Series C, Applied statistics.

[10]  Nitesh V. Chawla,et al.  A Review on Quantification Learning , 2017, ACM Comput. Surv..

[11]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[12]  T. McCormick,et al.  The openVA Toolkit for Verbal Autopsies , 2021, R J..

[13]  Samuel J. Clark,et al.  Probabilistic Cause-of-Death Assignment Using Verbal Autopsies , 2014, Journal of the American Statistical Association.

[14]  Rajendra Prasad,et al.  Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets , 2011, Population health metrics.

[15]  D. Dunson,et al.  Simplex Factor Models for Multivariate Unordered Categorical Data , 2012, Journal of the American Statistical Association.

[16]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[17]  Dongrui Wu,et al.  Overcoming Negative Transfer: A Survey , 2020, ArXiv.

[18]  Samuel J Clark,et al.  Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. , 2017, Bayesian analysis.

[19]  G. Parmigiani,et al.  Estimating the Effects of Fine Particulate Matter on 432 Cardiovascular Diseases Using Multi-Outcome Regression With Tree-Structured Shrinkage , 2020 .

[20]  Philippe Lemey,et al.  Large-scale inference of correlation among mixed-type biological traits with Phylogenetic multivariate probit models , 2019 .

[21]  Abhirup Datta,et al.  Regularized Bayesian transfer learning for population-level etiological distributions , 2018, Biostatistics.

[22]  Rafael Lozano,et al.  Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies , 2011, Population health metrics.

[23]  L. Price,et al.  Integrating Sample Similarities into Latent Class Analysis: A Tree-Structured Shrinkage Approach , 2021, bioRxiv.

[24]  Daniele Durante,et al.  Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models , 2017, Statistical Science.

[25]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[26]  Zoubin Ghahramani,et al.  Pitman Yor Diffusion Trees for Bayesian Hierarchical Clustering , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  David Dunson,et al.  Bayesian Factorizations of Big Sparse Tensors , 2013, Journal of the American Statistical Association.

[28]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[29]  Jerome P. Reiter,et al.  Incorporating Marginal Prior Information in Latent Class Models , 2016 .

[30]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[31]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[32]  Tyler H McCormick,et al.  BAYESIAN FACTOR MODELS FOR PROBABILISTIC CAUSE OF DEATH ASSESSMENT WITH VERBAL AUTOPSIES. , 2018, The annals of applied statistics.

[33]  Samuel J. Clark,et al.  Bayesian Nested Latent Class Models for Cause-of-Death Assignment using Verbal Autopsies Across Multiple Domains , 2021, 2112.12186.

[34]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[35]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[36]  Abhirup Datta,et al.  Generalized Bayes Quantification Learning under Dataset Shift , 2021 .