A Bayesian nonparametric model for zero‐inflated outcomes: Prediction, clustering, and causal estimation

Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero-inflation complicate these tasks - requiring highly flexible, data-adaptive modeling. In this paper, we present a multi-purpose Bayesian nonparametric model for continuous, zero-inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero-inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest - allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero-inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER-Medicare database. This article is protected by copyright. All rights reserved.

[1]  Antonio R. Linero,et al.  Semiparametric mixed‐scale models using shared Bayesian forests , 2018, Biometrics.

[2]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[3]  Jye-Chyi Lu,et al.  Bayesian analysis of zero-inflated regression models , 2006 .

[4]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[5]  M. Stephens Dealing with label switching in mixture models , 2000 .

[6]  Fernando A. Quintana,et al.  Bayesian Nonparametric Data Analysis , 2015 .

[7]  L. Hatfield,et al.  Identifying and interpreting subgroups in health care utilization data with count mixture regression models , 2019, Statistics in medicine.

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[10]  G. Baio,et al.  A Bayesian nonparametric model for white blood cells in patients with lower urinary tract symptoms , 2016 .

[11]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[12]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[13]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[14]  Edward I. George,et al.  Fully Nonparametric Bayesian Additive Regression Trees , 2018, ArXiv.

[15]  F. Kianifard,et al.  Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis , 2016, BMC Nephrology.

[16]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[17]  Kristin E. Porter,et al.  Diagnosing and responding to violations in the positivity assumption , 2012, Statistical methods in medical research.

[18]  David B. Dunson,et al.  Improving prediction from dirichlet process mixtures via enrichment , 2014, J. Mach. Learn. Res..

[19]  Michael J Daniels,et al.  A Bayesian nonparametric approach to marginal structural models for point treatments and a continuous or survival outcome. , 2017, Biostatistics.

[20]  Dandan Xu,et al.  A Bayesian nonparametric approach to causal inference on quantiles , 2018, Biometrics.

[21]  Abdus S Wahed,et al.  Bayesian Nonparametric Estimation for Dynamic Treatment Regimes With Sequential Transition Times , 2014, Journal of the American Statistical Association.

[22]  Stephanie M Engel,et al.  A Bayesian approach to the g-formula , 2015, Statistical methods in medical research.

[23]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[24]  Michael J Daniels,et al.  Bayesian nonparametric generative models for causal inference with missing at random covariates , 2017, Biometrics.

[25]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[26]  Stephen G. Walker,et al.  Label Switching in Bayesian Mixture Models: Deterministic Relabeling Strategies , 2014 .

[27]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[28]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[29]  Michael J. Daniels,et al.  A framework for Bayesian nonparametric inference for causal effects of mediation , 2017, Biometrics.