Machine learning in policy evaluation: new tools for causal inference

While machine learning (ML) methods have received a lot of attention in recent years, these methods are primarily for prediction. Empirical researchers conducting policy evaluations are, on the other hand, pre-occupied with causal problems, trying to answer counterfactual questions: what would have happened in the absence of a policy? Because these counterfactuals can never be directly observed (described as the "fundamental problem of causal inference") prediction tools from the ML literature cannot be readily used for causal inference. In the last decade, major innovations have taken place incorporating supervised ML tools into estimators for causal parameters such as the average treatment effect (ATE). This holds the promise of attenuating model misspecification issues, and increasing of transparency in model selection. One particularly mature strand of the literature include approaches that incorporate supervised ML approaches in the estimation of the ATE of a binary treatment, under the \textit{unconfoundedness} and positivity assumptions (also known as exchangeability and overlap assumptions). This article reviews popular supervised machine learning algorithms, including the Super Learner. Then, some specific uses of machine learning for treatment effect estimation are introduced and illustrated, namely (1) to create balance among treated and control groups, (2) to estimate so-called nuisance models (e.g. the propensity score, or conditional expectations of the outcome) in semi-parametric estimators that target causal parameters (e.g. targeted maximum likelihood estimation or the double ML estimator), and (3) the use of machine learning for variable selection in situations with a high number of covariates.

[1]  S. Rose Mortality risk score prediction in an elderly population using machine learning. , 2013, American journal of epidemiology.

[2]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[3]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[4]  Til Stürmer,et al.  The role of the c‐statistic in variable selection for propensity score models , 2011, Pharmacoepidemiology and drug safety.

[5]  J. Kleinberg,et al.  Prediction Policy Problems. , 2015, The American economic review.

[6]  Jared K Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. , 2017, Statistics in medicine.

[7]  Tyler J. VanderWeele,et al.  Concerning the consistency assumption in causal inference. , 2009, Epidemiology.

[8]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[9]  Matías Busso,et al.  New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators , 2014, Review of Economics and Statistics.

[10]  J. Zubizarreta Journal of the American Statistical Association Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure after Surgery , 2022 .

[11]  Georg Heinze,et al.  Variable selection – A review and recommendations for the practicing statistician , 2018, Biometrical journal. Biometrische Zeitschrift.

[12]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[13]  Matt Goldman,et al.  Orthogonal Machine Learning for Demand Estimation: High Dimensional Causal Inference in Dynamic Panels , 2017 .

[14]  Ofer Harel,et al.  Asymptotically Unbiased Estimation of Exposure Odds Ratios in Complete Records Logistic Regression , 2015, American journal of epidemiology.

[15]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[16]  Alan R. Ellis,et al.  The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score. , 2014, American journal of epidemiology.

[17]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[18]  Mark J. van der Laan,et al.  tmle : An R Package for Targeted Maximum Likelihood Estimation , 2012 .

[19]  M. J. Laan,et al.  Targeted Learning of an Optimal Dynamic Treatment, and Statistical Inference for its Mean Outcome , 2014 .

[20]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[21]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[22]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[23]  G. Imbens,et al.  Matching on the Estimated Propensity Score , 2009 .

[24]  S. Vansteelandt,et al.  On model selection and model misspecification in causal inference , 2012, Statistical methods in medical research.

[25]  J. Sekhon,et al.  Regression-adjusted matching and double-robust methods for estimating average treatment effects in health economic evaluation , 2013, Health Services and Outcomes Research Methodology.

[26]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[27]  Peter C. Austin,et al.  Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation , 2012, Multivariate behavioral research.

[28]  Aad van der Vaart,et al.  Higher Order Tangent Spaces and Influence Functions , 2014, 1502.00812.

[29]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[30]  Judea Pearl,et al.  On the Consistency Rule in Causal Inference: Axiom, Definition, Assumption, or Theorem? , 2010, Epidemiology.

[31]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[32]  Cheng Ju,et al.  Collaborative-controlled LASSO for constructing propensity score-based estimators in high-dimensional data , 2017, Statistical methods in medical research.

[33]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[34]  P. Richard Hahn,et al.  Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects , 2017, 1706.09523.

[35]  Stijn Vansteelandt,et al.  High-dimensional doubly robust tests for regression parameters , 2018, 1805.06714.

[36]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[37]  D. Katz The American Statistical Association , 2000 .

[38]  S. Schneeweiss,et al.  Evaluating uses of data mining techniques in propensity score estimation: a simulation study , 2008, Pharmacoepidemiology and drug safety.

[39]  Noemi Kreif,et al.  Data-adaptive doubly robust instrumental variable methods for treatment effect heterogeneity , 2018, 1802.02821.

[40]  Adam Kapelner,et al.  bartMachine: Machine Learning with Bayesian Additive Regression Trees , 2013, 1312.2171.

[41]  Mark J van der Laan,et al.  The International Journal of Biostatistics A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome , 2011 .

[42]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[43]  Stijn Vansteelandt,et al.  Introduction to Double Robust Methods for Incomplete Data. , 2018, Statistical science : a review journal of the Institute of Mathematical Statistics.

[44]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[45]  S. Dudoit,et al.  Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples , 2003 .

[46]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[47]  Ping Zhang Model Selection Via Multifold Cross Validation , 1993 .

[48]  J. Sekhon,et al.  Evaluating treatment effectiveness under model misspecification: A comparison of targeted maximum likelihood estimation with bias-corrected matching , 2014, Statistical methods in medical research.

[49]  Kristin E. Porter,et al.  The Relative Performance of Targeted Maximum Likelihood Estimators , 2011, The international journal of biostatistics.

[50]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[51]  B. M. Pötscher,et al.  CAN ONE ESTIMATE THE UNCONDITIONAL DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS? , 2007, Econometric Theory.

[52]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[53]  Mark J. van der Laan,et al.  Targeted Maximum Likelihood Estimation: A Gentle Introduction , 2009 .

[54]  Susan Athey,et al.  The State of Applied Econometrics - Causality and Policy Evaluation , 2016, 1607.00699.

[55]  Antoine Chambaz,et al.  Scalable collaborative targeted learning for high-dimensional data , 2017, Statistical methods in medical research.

[56]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[57]  F. Götze,et al.  RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[58]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[59]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.

[60]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[61]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[62]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[63]  Maya L Petersen,et al.  Commentary: Applying a Causal Road Map in Settings with Time-dependent Confounding , 2014, Epidemiology.

[64]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[65]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[66]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[67]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[68]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[69]  G. Imbens,et al.  Bias-Corrected Matching Estimators for Average Treatment Effects , 2002 .

[70]  Peter C Austin,et al.  Some Methods of Propensity‐Score Matching had Superior Performance to Others: Results of an Empirical Investigation and Monte Carlo simulations , 2009, Biometrical journal. Biometrische Zeitschrift.

[71]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.

[72]  Christian Hansen,et al.  Double/Debiased/Neyman Machine Learning of Treatment Effects , 2017, 1701.08687.

[73]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2015 .

[74]  Jennifer Hill,et al.  Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition , 2017, Statistical Science.

[75]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[76]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[77]  Mark J. van der Laan,et al.  TMLE for Marginal Structural Models Based on an Instrument , 2016 .

[78]  Maya L. Petersen,et al.  Applying a causal road map in settings with time-dependent confounding: Commentary on “The parametric G-formula for time-to-event data: toward intuition with a worked example." , 2014 .

[79]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[80]  Rocío Titiunik,et al.  Can Big Data Solve the Fundamental Problem of Causal Inference? , 2014, PS: Political Science & Politics.

[81]  Wei Lang,et al.  Examining the Impact of Missing Data on Propensity Score Estimation in Determining the Effectiveness of Self-Monitoring of Blood Glucose (SMBG) , 2001, Health Services and Outcomes Research Methodology.

[82]  Sylvie Chevret,et al.  Evaluation of the Propensity score methods for estimating marginal odds ratios in case of small sample size , 2012, BMC Medical Research Methodology.

[83]  D B Rubin,et al.  Matching using estimated propensity scores: relating theory to practice. , 1996, Biometrics.

[84]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[85]  D. Rubin,et al.  Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates , 2000 .

[86]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[87]  Debashis Ghosh,et al.  A Boosting Algorithm for Estimating Generalized Propensity Scores with Continuous Treatments , 2015, Journal of causal inference.

[88]  Edward H. Kennedy Semiparametric theory and empirical processes in causal inference , 2015, 1510.04740.

[89]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[90]  Peter J. Bickel,et al.  INFERENCE FOR SEMIPARAMETRIC MODELS: SOME QUESTIONS AND AN ANSWER , 2001 .

[91]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[92]  Ekaterina Eliseeva,et al.  An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes , 2013 .

[93]  Christian Hansen,et al.  Instrumental variables estimation with many weak instruments using regularized JIVE , 2014 .

[94]  Marco Carone,et al.  The Balance Super Learner: A robust adaptation of the Super Learner to improve estimation of the average treatment effect in the treated based on propensity score matching , 2018, Statistical methods in medical research.

[95]  Antoine Chambaz,et al.  Faster Rates for Policy Learning , 2017, 1704.06431.

[96]  Stephen R Cole,et al.  The consistency statement in causal inference: a definition or an assumption? , 2009, Epidemiology.

[97]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[98]  I. White,et al.  Inverse Probability Weighting with Missing Predictors of Treatment Assignment or Missingness , 2014 .

[99]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[100]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[101]  Susan Gruber,et al.  Ensemble learning of inverse probability weights for marginal structural modeling in large observational datasets , 2015, Statistics in medicine.

[102]  M. J. van der Laan,et al.  The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2011 .

[103]  Susan Athey,et al.  The Impact of Machine Learning on Economics , 2018, The Economics of Artificial Intelligence.

[104]  Mark J van der Laan,et al.  Targeted Learning of the Mean Outcome under an Optimal Dynamic Treatment Rule , 2015, Journal of causal inference.

[105]  Marc Ratkovic,et al.  Estimating treatment effect heterogeneity in randomized program evaluation , 2013, 1305.5682.

[106]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[107]  Stephen P. Ryan,et al.  Machine Learning Methods for Demand Estimation , 2015 .

[108]  Daniel Westreich,et al.  Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. , 2010, Journal of clinical epidemiology.

[109]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[110]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .