Inference for BART with Multinomial Outcomes

The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. 9 (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART4. Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application 1 ar X iv :2 10 1. 06 82 3v 1 [ st at .M E ] 1 8 Ja n 20 21 of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.

[1]  Carlos Del Rio,et al.  The spectrum of engagement in HIV care and its relevance to test-and-treat strategies for prevention of HIV infection. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[2]  Agostino Nobile,et al.  A hybrid Markov chain for the Bayesian analysis of the multinomial probit model , 1998, Stat. Comput..

[3]  Jared S. Murray,et al.  Log-Linear Bayesian Additive Regression Trees for Categorical and Count Responses , 2017, 1701.01503.

[4]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Purushottam W. Laud,et al.  Nonparametric survival analysis using Bayesian Additive Regression Trees (BART) , 2016, Statistics in medicine.

[7]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[8]  Peter E. Rossi,et al.  An exact likelihood analysis of the multinomial probit model , 1994 .

[9]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[10]  Hao Wang,et al.  Multinomial probit Bayesian additive regression trees , 2016, Stat.

[11]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[12]  P. Waldmann Genome-wide prediction using Bayesian additive regression trees , 2016, Genetics Selection Evolution.

[13]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[14]  Samuel J Clark,et al.  Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. , 2017, Bayesian analysis.

[15]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[16]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[17]  Thomas J. Steenburgh The Invariant Proportion of Substitution Property (IPS) of Discrete-Choice Models , 2008, Mark. Sci..

[18]  Lane F Burgette,et al.  The Trace Restriction: An Alternative Identification Strategy for the Bayesian Multinomial Probit Model , 2012 .

[19]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[20]  Tian Xia,et al.  A Bayesian regression tree approach to identify the effect of nanoparticles’ properties on toxicity profiles , 2015, 1506.00403.

[21]  D. V. Dyk,et al.  A Bayesian analysis of the multinomial probit model using marginal data augmentation , 2005 .

[22]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[23]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[24]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[25]  Peter E. Rossi,et al.  A Bayesian analysis of the multinomial probit model with fully identified parameters , 2000 .