Omitted Variable Bias in Machine Learned Causal Models

We derive general, yet simple, sharp bounds on the size of the omitted variable bias for a broad class of causal parameters that can be identified as linear functionals of the conditional expectation function of the outcome. Such functionals encompass many of the traditional targets of investigation in causal inference studies, such as, for example, (weighted) average of potential outcomes, average treatment effects (including subgroup effects, such as the effect on the treated), (weighted) average derivatives, and policy effects from shifts in covariate distribution—all for general, nonparametric causal models. Our construction relies on the Riesz-Frechet representation of the target functional. Specifically, we show how the bound on the bias depends only on the additional variation that the latent variables create both in the outcome and in the Riesz representer for the parameter of interest. Moreover, in many important cases (e.g, average treatment effects in partially linear models, or in nonseparable models with a binary treatment) the bound is shown to depend on two easily interpretable quantities: the nonparametric partial R (Pearson’s “correlation ratio”) of the unobserved variables with the treatment and with the outcome. Therefore, simple plausibility judgments on the maximum explanatory power of omitted variables (in explaining treatment and outcome variation) are sufficient to place overall bounds on the size of the bias. Finally, leveraging debiased machine learning, we provide flexible and efficient statistical inference methods to estimate the components of the bounds that are identifiable from the observed distribution.

[1]  T. Shakespeare,et al.  Observational Studies , 2003 .

[2]  K. Frank,et al.  What Would It Take to Change an Inference? Using Rubin’s Causal Model to Interpret the Robustness of Causal Inferences , 2013 .

[3]  Uri Shalit,et al.  Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding , 2021, ICML.

[4]  Elias Bareinboim,et al.  Sensitivity Analysis of Linear Structural Causal Models , 2019, ICML.

[5]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[6]  Charles F. Manski,et al.  Confidence Intervals for Partially Identified Parameters , 2003 .

[7]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[8]  J. Robins,et al.  Sensitivity Analyses for Unmeasured Confounding Assuming a Marginal Structural Model for Repeated Measures , 2022 .

[9]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[10]  Victor Chernozhukov,et al.  Automatic Debiased Machine Learning of Causal and Structural Effects , 2018 .

[11]  G. W. Imbens Sensitivity to Exogeneity Assumptions in Program Evaluation , 2003 .

[12]  M. Blackwell A Selection Bias Approach to Sensitivity Analysis for Causal Effects , 2014, Political Analysis.

[13]  Carlos Cinelli,et al.  An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables , 2022, SSRN Electronic Journal.

[14]  Xiaojie Mao,et al.  Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding , 2018, AISTATS.

[15]  K. Frank Impact of a Confounding Variable on a Regression Coefficient , 2000 .

[16]  Jennifer Hill,et al.  Bias Amplification and Bias Unmasking , 2016, Political Analysis.

[17]  Edward H. Kennedy,et al.  Sensitivity Analysis via the Proportion of Unmeasured Confounding , 2019, 1912.02793.

[18]  L. Keele,et al.  Identification, Inference and Sensitivity Analysis for Causal Mediation Effects , 2010, 1011.1079.

[19]  Ming-Yueh Huang,et al.  Semiparametric Sensitivity Analysis: Unmeasured Confounding In Observational Studies , 2021, 2104.08300.

[20]  Vasilis Syrgkanis,et al.  Automatic Debiased Machine Learning via Neural Nets for Generalized Linear Regression , 2021, 2104.14737.

[21]  J. Pearl Causal diagrams for empirical research , 1995 .

[22]  James M. Robins,et al.  On the Validity of Covariate Adjustment for Estimating Causal Effects , 2010, UAI.

[23]  Christopher R. Taber,et al.  Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools , 2000, Journal of Political Economy.

[24]  James M. Robins,et al.  Double/De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers , 2018 .

[25]  Victor Chernozhukov,et al.  Debiased machine learning of conditional average treatment effects and other causal functions , 2017 .

[26]  Jörg Stoye,et al.  More on Confidence Intervals for Partially Identified Parameters , 2008 .

[27]  Wolfgang Wefelmeyer,et al.  A third-order optimum property of the maximum likelihood estimator , 1978 .

[28]  Alexander D'Amour,et al.  Flexible Sensitivity Analysis for Observational Studies Without Observable Implications , 2018, Journal of the American Statistical Association.

[29]  Onyebuchi A Arah,et al.  Bias Formulas for Sensitivity Analysis of Unmeasured Confounding for General Outcomes, Treatments, and Confounders , 2011, Epidemiology.

[30]  Vasilis Syrgkanis,et al.  RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests , 2021, ArXiv.

[31]  Paul W. Holland,et al.  The sensitivity of linear regression coefficients' confidence limits to the omission of a confounder , 2009, 0905.3463.

[32]  James M. Robins,et al.  Association, Causation, And Marginal Structural Models , 1999, Synthese.

[33]  Masataka Harada,et al.  A flexible, interpretable framework for assessing sensitivity to unmeasured confounding , 2016, Statistics in medicine.

[34]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[35]  E. C. Hammond,et al.  Smoking and lung cancer: recent evidence and a discussion of some questions. , 1959, Journal of the National Cancer Institute.

[36]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[37]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[38]  Gianluca Detommaso,et al.  Causal Bias Quantification for Continuous Treatment , 2021, ArXiv.

[39]  Tyler J. VanderWeele,et al.  Sensitivity Analysis in Observational Research: Introducing the E-Value , 2017, Annals of Internal Medicine.

[40]  Vasilis Syrgkanis,et al.  Adversarial Estimation of Riesz Representers , 2020, ArXiv.

[41]  J. Robins,et al.  Locally Robust Semiparametric Estimation , 2016, Econometrica.

[42]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[43]  Carlos Cinelli,et al.  Making sense of sensitivity: extending omitted variable bias , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[44]  Karl Pearson,et al.  On the General Theory of Skew Correlation and Non-Linear Regression , 2010 .

[45]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[46]  E. Oster Unobservable Selection and Coefficient Stability: Theory and Evidence , 2019 .

[47]  J. Pearl Causal inference in statistics: An overview , 2009 .

[48]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[49]  Kjell A. Doksum,et al.  Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression , 1995 .

[50]  Jennifer L. Hill,et al.  Assessing Sensitivity to Unmeasured Confounding Using a Simulated Potential Confounder , 2016 .