Causal Inference: A Missing Data Perspective

Inferring causal effects of treatments is a central goal in many disciplines. The potential outcomes framework is a main statistical approach to causal inference, in which a causal effect is defined as a comparison of the potential outcomes of the same units under different treatment conditions. Because for each unit at most one of the potential outcomes is observed and the rest are missing, causal inference is inherently a missing data problem. Indeed, there is a close analogy in the terminology and the inferential framework between causal inference and missing data. Despite the intrinsic connection between the two subjects, statistical analyses of causal inference and missing data also have marked differences in aims, settings and methods. This article provides a systematic review of causal inference from the missing data perspective. Focusing on ignorable treatment assignment mechanisms, we discuss a wide range of causal inference methods that have analogues in missing data analysis, such as imputation, inverse probability weighting and doubly-robust methods. Under each of the three modes of inference--Frequentist, Bayesian, and Fisherian randomization--we present the general structure of inference for both finite-sample and super-population estimands, and illustrate via specific examples. We identify open questions to motivate more research to bridge the two fields.

[1]  A. Ichino,et al.  From Temporary Help Jobs to Permanent Employment: What Can We Learn from Matching Estimators and Their Sensitivity? , 2006, SSRN Electronic Journal.

[2]  Guido W. Imbens,et al.  EFFICIENT ESTIMATION OF AVERAGE TREATMENT EFFECTS , 2003 .

[3]  Michael R Elliott,et al.  Bayesian inference for causal mediation effects using principal stratification with dichotomous mediators and outcomes. , 2010, Biostatistics.

[4]  D. Rubin,et al.  Bayesian inference for causal effects in randomized experiments with noncompliance , 1997 .

[5]  Corwin M Zigler,et al.  A Bayesian Approach to Improved Estimation of Causal Effect Predictiveness for a Principal Surrogate Endpoint , 2012, Biometrics.

[6]  Donald B. Rubin,et al.  The fragility of standard inferential approaches in principal stratification models relative to direct likelihood approaches , 2016, Stat. Anal. Data Min..

[7]  P. Holland Statistics and Causal Inference , 1985 .

[8]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[9]  P. Rosenbaum Sensitivity analysis for certain permutation inferences in matched observational studies , 1987 .

[10]  Zhi Geng,et al.  Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Nonignorable Missing Data , 2009, Biometrics.

[11]  Fan Li,et al.  Do debit cards increase household spending? Evidence from a semiparametric causal analysis of a survey , 2014, 1409.2441.

[12]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[13]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[14]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[15]  Stefan Wager,et al.  High-dimensional regression adjustments in randomized experiments , 2016, Proceedings of the National Academy of Sciences.

[16]  D. Rubin,et al.  Principal Stratification in Causal Inference , 2002, Biometrics.

[17]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[18]  Jerome P. Reiter,et al.  Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence , 2014, 1410.0438.

[19]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[20]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[21]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[22]  W. Hoeffding The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .

[23]  Jerome P. Reiter,et al.  A comparison of two methods of estimating propensity scores after multiple imputation , 2016, Statistical methods in medical research.

[24]  D. Rubin,et al.  Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes , 1999 .

[25]  J. Neyman,et al.  Statistical Problems in Agricultural Experimentation , 1935 .

[26]  D. Katz The American Statistical Association , 2000 .

[27]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[28]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[29]  Alessandra Mattei,et al.  Evaluating the Causal Effect of University Grants on Student Dropout: Evidence from a Regression Discontinuity Design Using Principal Stratification , 2015, 1507.04199.

[30]  S. Chib,et al.  Bayesian Fuzzy Regression Discontinuity Analysis and Returns to Compulsory Schooling , 2016 .

[31]  J. Heckman Sample selection bias as a specification error , 1979 .

[32]  Daniel F. McCaffrey,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2008, 0804.2962.

[33]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[34]  Fan Li,et al.  Do debit cards decrease cash demand?: causal inference and sensitivity analysis using principal stratification , 2017 .

[35]  Dongming Zhu,et al.  Partial Identication and Condence Sets for Functionals of the Joint Distribution of Potential Outcomes , 2009 .

[36]  C. Manski Nonparametric Bounds on Treatment Effects , 1989 .

[37]  Donald B. Rubin,et al.  Likelihood-Based Analysis of Causal Effects of Job-Training Programs Using Principal Stratification , 2009 .

[38]  Dan Jackson,et al.  What Is Meant by "Missing at Random"? , 2013, 1306.2812.

[39]  Jiannan Lu,et al.  Principal stratification analysis using principal scores , 2016, 1602.01196.

[40]  Guangyu Zhang,et al.  Extensions of the Penalized Spline of Propensity Prediction Method of Imputation , 2009, Biometrics.

[41]  C. Blumberg Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[42]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[43]  W. Newey,et al.  Convergence rates and asymptotic normality for series estimators , 1997 .

[44]  Donald B. Rubin,et al.  Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program , 2014 .

[45]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[46]  W. Lin,et al.  Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , 2012, 1208.2301.

[47]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1994 .

[48]  Dylan S. Small,et al.  Bounds on causal effects in three‐arm trials with non‐compliance , 2006 .

[49]  Tirthankar Dasgupta,et al.  Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds , 2015, 1507.01542.

[50]  Michael G Hudgens,et al.  Evaluating Candidate Principal Surrogate Endpoints , 2008, Biometrics.

[51]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[52]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[53]  Kari Lock Morgan,et al.  Balancing Covariates via Propensity Score Weighting , 2014, 1609.07494.

[54]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[55]  Jerome P. Reiter,et al.  Estimating propensity scores with missing covariate data using general location mixture models. , 2011, Statistics in medicine.

[56]  R. Little,et al.  Robust Likelihood-based Analysis of Multivariate Data with Missing Values , 2003 .

[57]  A. Winsor Sampling techniques. , 2000, Nursing times.

[58]  Dylan S. Small,et al.  Using post‐outcome measurement information in censoring‐by‐death problems , 2016 .

[59]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[60]  P. Ding,et al.  Causal inference with confounders missing not at random , 2017, Biometrika.

[61]  Michael G. Hudgens,et al.  Randomization-Based Inference Within Principal Strata , 2011, Journal of the American Statistical Association.

[62]  P. Rosenbaum Covariance Adjustment in Randomized Experiments and Observational Studies , 2002 .

[63]  P. Ding,et al.  Nonparametric identification of causal effects with confounders subject to instrumental missingness , 2017 .

[64]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[65]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[66]  D. Rubin Assignment to Treatment Group on the Basis of a Covariate , 1976 .

[67]  Luke W. Miratrix,et al.  Principal stratification in the Twilight Zone: Weakly separated components in finite mixture models , 2016, 1602.06595.

[68]  Cun-Hui Zhang,et al.  Lasso adjustments of treatment effect estimates in randomized experiments , 2015, Proceedings of the National Academy of Sciences.

[69]  Tyler J. VanderWeele,et al.  Sensitivity Analysis Without Assumptions , 2015, Epidemiology.

[70]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[71]  W. Newey,et al.  Double machine learning for treatment and causal parameters , 2016 .

[72]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[73]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[74]  J. Hahn On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects , 1998 .

[75]  G. Imbens,et al.  Analyzing a randomized trial on breast self-examination with noncompliance and missing outcomes. , 2004, Biostatistics.

[76]  Oscar Kempthorne,et al.  Experimental Designs in Sociological Research. , 1949 .

[77]  D. Rubin Matched Sampling for Causal Effects , 2006 .

[78]  Dylan S Small,et al.  Discussion of "Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data". , 2009, Biometrics.

[79]  D. Basu Randomization Analysis of Experimental Data: The Fisher Randomization Test , 1980 .

[80]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[81]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[82]  David Lindley,et al.  Bayesian Statistics, a Review , 1987 .

[83]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[84]  G. Imbens,et al.  Efficient Inference of Average Treatment Effects in High Dimensions via Approximate Residual Balancing , 2016 .

[85]  Bo Zhang,et al.  Causal inference with missing exposure information: Methods and applications to an obstetric study , 2016, Statistical methods in medical research.

[86]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[87]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[88]  Donald B. Rubin,et al.  Evaluating the Effect of Training on Wages in the Presence of Noncompliance, Nonemployment, and Missing Outcome Data , 2012 .

[89]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[90]  M. Davidian,et al.  Covariate adjustment for two‐sample treatment comparisons in randomized clinical trials: A principled yet flexible approach , 2008, Statistics in medicine.

[91]  Zhi Geng,et al.  Identifiability of subgroup causal effects in randomized experiments with nonignorable missing covariates. , 2014, Statistics in medicine.

[92]  D. Andrews Inconsistency of the Bootstrap when a Parameter is on the Boundary of the Parameter Space , 2000 .

[93]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[94]  James J Heckman,et al.  Treatment Effects: A Bayesian Perspective , 2014, Econometric reviews.

[95]  Francesca Molinari,et al.  Missing Treatments , 2010 .

[96]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[97]  Stefan Wager,et al.  Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges , 2017, 1702.01250.

[98]  W. G. Cochran Analysis of covariance: Its nature and uses. , 1957 .

[99]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[100]  D. Rubin,et al.  Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in Observational Studies , 1978 .

[101]  M. Musio,et al.  The probability of causation , 2017, 1706.05566.

[102]  E. C. Hammond,et al.  Smoking and lung cancer: recent evidence and a discussion of some questions. , 1959, Journal of the National Cancer Institute.

[103]  G. Imbens,et al.  Bias-Corrected Matching Estimators for Average Treatment Effects , 2002 .

[104]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[105]  Alessandra Mattei,et al.  Identification of causal effects in the presence of nonignorable missing outcome values , 2014, Biometrics.

[106]  Peter X.-K. Song,et al.  EM algorithm in Gaussian copula with missing data , 2016, Comput. Stat. Data Anal..

[107]  Joseph P. Romano,et al.  EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS , 2013, 1304.5939.

[108]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[109]  Jared K Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. , 2017, Statistics in medicine.

[110]  Luke W. Miratrix,et al.  Adjusting treatment effect estimates by post‐stratification in randomized experiments , 2013 .

[111]  P. Ding,et al.  General Forms of Finite Population Central Limit Theorems with Applications to Causal Inference , 2016, 1610.04821.

[112]  D B Rubin,et al.  More powerful randomization-based p-values in double-blind trials with non-compliance. , 1998, Statistics in medicine.

[113]  Jerome P. Reiter,et al.  Sensitivity analysis for unmeasured confounding in principal stratification settings with binary variables , 2012, Statistics in medicine.

[114]  Peng Ding,et al.  Randomization inference for treatment effect variation , 2014, 1412.5000.

[115]  T. Shakespeare,et al.  Observational Studies , 2003 .

[116]  Zhi Geng,et al.  Identifiability and Estimation of Causal Effects by Principal Stratification With Outcomes Truncated by Death , 2011 .

[117]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[118]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[119]  J. Tukey Tightening the clinical trial. , 1993, Controlled clinical trials.

[120]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[121]  R. Gallop,et al.  Mediation analysis with principal stratification , 2009, Statistics in medicine.

[122]  Kosuke Imai,et al.  Sharp bounds on the causal effects in randomized experiments with "truncation-by-death" , 2008 .

[123]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[124]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[125]  James M. Robins,et al.  Asymptotic Distribution of P Values in Composite Null Models , 2000 .

[126]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[127]  James M. Robins,et al.  Transparent Parametrizations of Models for Potential Outcomes , 2012 .

[128]  J. Robins,et al.  Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. , 1997, Statistics in medicine.

[129]  P. Rosenbaum The Consequences of Adjustment for a Concomitant Variable that Has Been Affected by the Treatment , 1984 .

[130]  Fabrizia Mealli,et al.  Nonparametric Bounds on the Causal Effect of University Studies on Job Opportunities Using Principal Stratification , 2008 .

[131]  F. Mealli,et al.  Augmented designs to assess principal strata direct effects , 2011 .

[132]  J. I The Design of Experiments , 1936, Nature.

[133]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Causal Parameters , 2016, 1608.00060.

[134]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[135]  Andrea Mercatanti Analyzing a randomized experiment with imperfect compliance and ignorable conditions for missing data: theoretical and computational issues , 2004, Comput. Stat. Data Anal..

[136]  G. Imbens,et al.  The Propensity Score with Continuous Treatments , 2005 .

[137]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[138]  Coarsened Propensity Scores and Hybrid Estimators for Missing Data and Causal Inference , 2015 .

[139]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[140]  A. Dawid Causal Inference without Counterfactuals , 2000 .

[141]  J. Kmenta Mostly Harmless Econometrics: An Empiricist's Companion , 2010 .

[142]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[143]  G. W. Imbens Sensitivity to Exogeneity Assumptions in Program Evaluation , 2003 .

[144]  J. Qin Biased sampling, over-identified parameter problems and beyond , 2017 .

[145]  G. Imbens,et al.  Machine Learning Methods for Estimating Heterogeneous Causal Eects , 2015 .

[146]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[147]  T. VanderWeele Simple relations between principal stratification and direct and indirect effects , 2008 .

[148]  P. Rosenbaum Design of Observational Studies , 2009, Springer Series in Statistics.

[149]  P. Rosenbaum Conditional Permutation Tests and the Propensity Score in Observational Studies , 1984 .

[150]  Donald B. Rubin,et al.  ‘Clarifying missing at random and related definitions, and implications when coupled with exchangeability’ , 2015 .

[151]  B. D. Finetti,et al.  Foresight: Its Logical Laws, Its Subjective Sources , 1992 .

[152]  Yanqin Fan,et al.  SHARP BOUNDS ON THE DISTRIBUTION OF TREATMENT EFFECTS AND THEIR STATISTICAL INFERENCE , 2009, Econometric Theory.

[153]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[154]  Paul Gustafson,et al.  What Are the Limits of Posterior Distributions Arising From Nonidentified Models, and Why Should We Care? , 2009 .

[155]  Kosuke Imai,et al.  Causal Inference With General Treatment Regimes , 2004 .

[156]  Donald B. Rubin,et al.  Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by “Death” , 2003 .

[157]  Tirthankar Dasgupta,et al.  A Potential Tale of Two-by-Two Tables From Completely Randomized Experiments , 2015, 1501.02389.

[158]  G. Imbens The Role of the Propensity Score in Estimating Dose-Response Functions , 1999 .

[159]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[160]  W. Mebane,et al.  Causal Inference without Ignorability: Identification with Nonrandom Assignment and Missing Treatment Data , 2013, Political Analysis.

[161]  Peng Ding,et al.  Three Occurrences of the Hyperbolic-Secant Distribution , 2014, 1401.1267.

[162]  David Firth,et al.  Robust models in probability sampling , 1998 .