A causal inference framework for spatial confounding

Recently, addressing spatial confounding has become a major topic in spatial statistics. However, the literature has provided conflicting definitions, and many proposed definitions do not address the issue of confounding as it is understood in causal inference. We define spatial confounding as the existence of an unmeasured causal confounder with a spatial structure. We present a causal inference framework for nonparametric identification of the causal effect of a continuous exposure on an outcome in the presence of spatial confounding. We propose double machine learning (DML), a procedure in which flexible models are used to regress both the exposure and outcome variables on confounders to arrive at a causal estimator with favorable robustness properties and convergence rates, and we prove that this approach is consistent and asymptotically normal under spatial dependence. As far as we are aware, this is the first approach to spatial confounding that does not rely on restrictive parametric assumptions (such as linearity, effect homogeneity, or Gaussianity) for both identification and estimation. We demonstrate the advantages of the DML approach analytically and in simulations. We apply our methods and reasoning to a study of the effect of fine particulate matter exposure during pregnancy on birthweight in California.

[1]  Edward H. Kennedy Semiparametric doubly robust targeted double machine learning: a review , 2022, 2203.06469.

[2]  Shu Yang,et al.  Spectral adjustment for spatial confounding. , 2020, Biometrika.

[3]  S. Wood,et al.  Spatial+: A novel approach to spatial confounding , 2020, Biometrics.

[4]  Catherine A. Calder,et al.  Restricted Spatial Regression Methods: Implications for Inference , 2019, Journal of the American Statistical Association.

[5]  Elizabeth L. Ogburn,et al.  Causal Inference for Social Network Data , 2017, Journal of the American Statistical Association.

[6]  Sumanta Basu,et al.  Random Forests for Spatially Dependent Data , 2021, Journal of the American Statistical Association.

[7]  Dale L. Zimmerman,et al.  On Deconfounding Spatial Confounding in Linear Models , 2021, The American Statistician.

[8]  R. Morello-Frosch,et al.  Environmental hazards, social inequality, and fetal loss: Implications of live-birth bias for estimation of disparities in birth outcomes , 2021, Environmental epidemiology.

[9]  M. Foster,et al.  Application of the navigation guide systematic review methodology to evaluate prenatal exposure to particulate matter air pollution and infant birth weight. , 2021, Environment international.

[10]  A. Sjölander,et al.  On the bias of adjusting for a non-differentially mismeasured discrete confounder , 2021, Journal of Causal Inference.

[11]  Shu Yang,et al.  A Review of Spatial Causal Inference Methods for Environmental and Epidemiological Applications , 2020, International statistical review = Revue internationale de statistique.

[12]  E. J. Tchetgen Tchetgen,et al.  An Introduction to Proximal Causal Learning , 2020, medRxiv.

[13]  L. Alonso,et al.  Exposure to greenspace and birth weight in a middle-income country. , 2020, Environmental research.

[14]  E. Schenck,et al.  Nonparametric Causal Effects Based on Longitudinal Modified Treatment Policies , 2020, 2006.01366.

[15]  Fabio Sigrist,et al.  Gaussian Process Boosting , 2020, J. Mach. Learn. Res..

[16]  Jakob A. Dambon,et al.  Maximum likelihood estimation of spatially varying coefficient models for large data with an application to real estate price prediction , 2020, Spatial Statistics.

[17]  Joshua P. Keller,et al.  Selecting a scale for spatial confounding adjustment , 2019, Journal of the Royal Statistical Society. Series A,.

[18]  Benjamin Goehry Random forests for time-dependent processes , 2020, ESAIM: Probability and Statistics.

[19]  Georgia Papadogeorgou,et al.  Mitigating unobserved spatial confounding when estimating the effect of supermarket access on cardiovascular disease deaths , 2019 .

[20]  Shengwei Zhu,et al.  Effects of neighborhood green space on PM2.5 mitigation: Evidence from five megacities in China , 2019, Building and Environment.

[21]  Thomas B. Schön,et al.  Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding , 2019, ICML.

[22]  Corwin M Zigler,et al.  Adjusting for unmeasured spatial confounding with distance adjusted propensity score matching , 2016, Biostatistics.

[23]  Thomas Kneib,et al.  Structural Equation Models for Dealing With Spatial Confounding , 2018 .

[24]  Edward H. Kennedy Nonparametric Causal Effects Based on Incremental Propensity Score Interventions , 2017, Journal of the American Statistical Association.

[25]  Abhirup Datta,et al.  BRISC: bootstrap for rapid inference on spatial covariances , 2018 .

[26]  Alexander D'Amour,et al.  Overlap in observational studies with high-dimensional covariates , 2017, Journal of Econometrics.

[27]  P. Hystad,et al.  Associations between multiple green space measures and birth weight across two US cities , 2017, Health & place.

[28]  L. Tian,et al.  Association between ambient fine particulate matter and preterm birth or term low birth weight: An updated systematic review and meta-analysis. , 2017, Environmental pollution.

[29]  D. Hochuli,et al.  Defining greenspace: Multiple uses across multiple disciplines , 2017 .

[30]  Edward H Kennedy,et al.  Non‐parametric methods for doubly robust estimation of continuous treatment effects , 2015, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[31]  David B. Dunson,et al.  Semiparametric Bernstein-von Mises Theorem: Second Order Studies , 2015, 1503.04493.

[32]  Christopher Winship,et al.  Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. , 2014, Annual review of sociology.

[33]  J. Pearl,et al.  Measurement bias and effect restoration in causal inference , 2014 .

[34]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[35]  Rupa Basu,et al.  Effects of fine particulate matter and its constituents on low birth weight among full-term infants in California. , 2014, Environmental research.

[36]  Mark J. van der Laan,et al.  Causal Inference for a Population of Causally Connected Units , 2014, Journal of causal inference.

[37]  S Haneuse,et al.  Estimation of the effect of interventions that modify the received treatment , 2013, Statistics in medicine.

[38]  Elizabeth L. Ogburn,et al.  Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders. , 2013, Biometrika.

[39]  Tyler J. VanderWeele,et al.  On the definition of a confounder , 2013, Annals of statistics.

[40]  Mark van der Laan,et al.  Population Intervention Causal Effects Based on Stochastic Interventions , 2012, Biometrics.

[41]  Tyler J. VanderWeele,et al.  On the Nondifferential Misclassification of a Binary Confounder , 2012, Epidemiology.

[42]  Kristin E. Porter,et al.  Diagnosing and responding to violations in the positivity assumption , 2012, Statistical methods in medical research.

[43]  P. Bickel,et al.  The semiparametric Bernstein-von Mises theorem , 2010, 1007.0179.

[44]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[45]  T. Smith A Central Limit Theorem for Spatial Samples , 2010 .

[46]  S. Cole,et al.  Invited commentary: positivity in practice. , 2010, American journal of epidemiology.

[47]  Christopher J Paciorek,et al.  The importance of scale for spatial-confounding bias and precision of spatial regression estimators. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[48]  Judea Pearl,et al.  Causal Inference , 2010 .

[49]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[50]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[51]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[52]  M. Hernán,et al.  The birth weight "paradox" uncovered? , 2006, American journal of epidemiology.

[53]  J. Behrman,et al.  Returns to Birthweight , 2004, Review of Economics and Statistics.

[54]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[55]  C. Pugh Real Mathematical Analysis , 2003 .

[56]  Daniel Krewski,et al.  Exploring bias in a generalized additive model for spatial air pollution data. , 2003, Environmental health perspectives.

[57]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[58]  J. Angrist Treatment Effect Heterogeneity in Theory and Practice , 2003 .

[59]  S. Wood Thin plate regression splines , 2003 .

[60]  A. Wilcox,et al.  On the importance--and the unimportance--of birthweight. , 2001, International journal of epidemiology.

[61]  Nelson Gouveia,et al.  Time series analysis of air pollution and mortality: effects by cause, age and socioeconomic status , 2000, Journal of epidemiology and community health.

[62]  J. Chye,et al.  Very low birth weight infants--mortality and predictive risk factors. , 1999, Singapore medical journal.

[63]  C. Cox,et al.  Threshold dose-response models in toxicology. , 1987, Biometrics.

[64]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[65]  J. Rice Convergence rates for partially splined models , 1986 .

[66]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[67]  I. Ibragimov,et al.  Some Limit Theorems for Stationary Processes , 1962 .