Bayesian Modeling for Exposure Response Curve via Gaussian Processes: Causal Effects of Exposure to Air Pollution on Health Outcomes

Motivated by environmental health research on air pollution, we address the challenge of estimation and uncertainty quantification of causal exposure-response function (CERF). The CERF describes the relationship between a continuously varying exposure (or treatment) and its causal effect on a outcome. We propose a new Bayesian approach that relies on a Gaussian process (GP) model to estimate the CERF. We parametrize the covariance (kernel) function of the GP to mimic matching via a Generalized Propensity Score (GPS). The tuning parameters of the matching function are chosen to optimize covariate balance. Our approach achieves automatic uncertainty evaluation of the CERF with high computational efficiency, enables change point detection through inference on derivatives of the CERF, and yields the desired separation of design and analysis phases for causal estimation. We provide theoretical results showing the correspondence between our Bayesian GP framework and traditional approaches in causal inference for estimating causal effects of a continuous exposure. We apply the methods 1 ar X iv :2 10 5. 03 45 4v 2 [ st at .M E ] 9 J un 2 02 1 to 520,711 ZIP-code-level observations to estimate the causal effect of long-term exposures to PM2.5 on all-cause mortality among Medicare enrollees in the United States.

[1]  Danielle Braun,et al.  Matching on Generalized Propensity Scores with Continuous Exposures , 2018, Journal of the American Statistical Association.

[2]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[3]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[4]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[5]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[6]  Shu Yang,et al.  Generalized propensity score approach to causal inference with spatial interference , 2020, Biometrics.

[7]  Gary R. Mirams,et al.  Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models , 2018, 1805.10020.

[8]  Corwin M Zigler,et al.  Best Practices for Gauging Evidence of Causality in Air Pollution Epidemiology , 2017, American journal of epidemiology.

[9]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[10]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[11]  Nathan Kallus,et al.  Kernel Optimal Orthogonality Weighting: A Balancing Approach to Estimating Effects of Continuous Treatments , 2019, 1910.11972.

[12]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[13]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[14]  Michael W. Robbins,et al.  Nonparametric estimation of population average dose-response curves using entropy balancing weights for continuous exposures , 2020, Health Services and Outcomes Research Methodology.

[15]  Yan Wang,et al.  Air Pollution and Mortality in the Medicare Population , 2017, The New England journal of medicine.

[16]  J. Schwartz,et al.  Evaluating the impact of long-term exposure to fine particulate matter on mortality among the elderly , 2020, Science Advances.

[17]  J. Schwartz,et al.  An ensemble learning approach for estimating high spatiotemporal resolution of ground-level ozone in the contiguous United States. , 2020, Environmental science & technology.

[18]  Yixin Wang,et al.  Minimal dispersion approximately balancing weights: asymptotic properties and practical considerations , 2017, Biometrika.

[19]  Ying-Ying Lee,et al.  Double debiased machine learning nonparametric inference with continuous treatments , 2019, 2004.03036.

[20]  James M. Robins,et al.  Marginal Structural Models versus Structural nested Models as Tools for Causal inference , 2000 .

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Antonella Zanobetti,et al.  Association of Short-term Exposure to Air Pollution With Mortality in Older Adults , 2017, JAMA.

[23]  Paul R. Rosenbaum,et al.  Matching for Balance, Pairing for Heterogeneity in an Observational Study of the Effectiveness of For-Profit and Not-For-Profit High Schools in Chile , 2014, 1404.3584.

[24]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[25]  Alexei Lyapustin,et al.  Assessing NO2 Concentration and Model Uncertainty with High Spatiotemporal Resolution across the Contiguous United States Using Ensemble Model Averaging. , 2019, Environmental science & technology.

[26]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[27]  A. B. Hill The Environment and Disease: Association or Causation? , 1965, Proceedings of the Royal Society of Medicine.

[28]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[29]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[30]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[31]  Keying Ye,et al.  Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives , 2005, Technometrics.

[32]  Maureen Cropper,et al.  Health benefits of air pollution abatement policy: Role of the shape of the concentration–response function , 2015, Journal of the Air & Waste Management Association.

[33]  Erica E. M. Moodie,et al.  Doubly Robust Estimation of Optimal Dosing Strategies , 2020 .

[34]  Alexei Lyapustin,et al.  An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. , 2019, Environment international.

[35]  Francesca Dominici,et al.  Don't abandon evidence and process on air pollution policy , 2019, Science.

[36]  J. Enstrom Air Pollution and Mortality in the Medicare Population. , 2017, The New England journal of medicine.

[37]  J. Zubizarreta Journal of the American Statistical Association Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery , 2022 .

[38]  Edward H Kennedy,et al.  Non‐parametric methods for doubly robust estimation of continuous treatment effects , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[39]  Kiros Berhane,et al.  A Bayesian approach to functional-based multilevel modeling of longitudinal data: applications to environmental epidemiology. , 2008, Biostatistics.

[40]  G. Imbens The Role of the Propensity Score in Estimating Dose-Response Functions , 1999 .

[41]  J. Schwartz,et al.  Causal Effects of Air Pollution on Mortality in Massachusetts. , 2020, American journal of epidemiology.