The Machine Learning Control Method for Counterfactual Forecasting (preprint)

Without a credible control group, the most widespread methodologies for estimating causal effects cannot be applied. To fill this gap, we propose the Machine Learning Control Method (MLCM), a new approach for causal panel analysis based on counterfactual forecasting with machine learning. The MLCM estimates policy-relevant causal parameters in short- and long-panel settings without relying on untreated units. We formalize identification in the potential outcomes framework and then provide estimation based on supervised machine learning algorithms. To illustrate the advantages of our estimator, we present simulation evidence and an empirical application on the impact of the COVID-19 crisis on educational inequality in Italy. We implement the proposed method in the companion R package MachineControl.

[1]  G. Imbens,et al.  Causal Models for Longitudinal and Panel Data: A Survey , 2023, 2311.15458.

[2]  Yiqing Xu,et al.  What To Do (and Not to Do) with Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study , 2023, SSRN Electronic Journal.

[3]  M. Weidner,et al.  Forecasted Treatment Effects , 2023, SSRN Electronic Journal.

[4]  Andrew C. Eggers,et al.  Placebo Tests for Causal Inference , 2023, American Journal of Political Science.

[5]  S. Athey,et al.  The Heterogeneous Earnings Impact of Job Loss Across Workers, Establishments, and Markets , 2023, 2307.06684.

[6]  Eliana La Ferrara,et al.  Exacerbated Inequalities: The Learning Loss from COVID-19 in Italy , 2023, AEA Papers and Proceedings.

[7]  F. Cipollini,et al.  Combining counterfactual outcomes and ARIMA models for policy evaluation , 2022, The Econometrics Journal.

[8]  M. Battisti,et al.  Will the last be the first? School closures and educational outcomes , 2022, European Economic Review.

[9]  J. Roth,et al.  What’s trending in difference-in-differences? A synthesis of the recent econometrics literature , 2022, Journal of Econometrics.

[10]  Ludger Woessmann,et al.  The Legacy of COVID-19 in Education , 2021, SSRN Electronic Journal.

[11]  Xavier Jaravel,et al.  Revisiting event study designs: robust and efficient estimation , 2021, 2108.12419.

[12]  M. Letta,et al.  Local mortality estimates during the COVID-19 pandemic in Italy , 2021, Journal of Population Economics.

[13]  Alberto Abadie,et al.  Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects , 2021, Journal of Economic Literature.

[14]  Yuya Sasaki,et al.  Nonparametric difference-in-differences in repeated cross-sections with continuous treatments , 2021, Journal of Econometrics.

[15]  Fabrizio Zilibotti,et al.  When the great equalizer shuts down: Schools, peers, and parents in pandemic times , 2020, Journal of Public Economics.

[16]  Ye Wang,et al.  A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data , 2020, SSRN Electronic Journal.

[17]  B. Sampaio,et al.  The Effect of Job Loss and Unemployment Insurance on Crime in Brazil , 2020, SSRN Electronic Journal.

[18]  Stefan Wager,et al.  Sufficient Representations for Categorical Variables , 2019, ArXiv.

[19]  Robert P. Lieli,et al.  Estimation of Conditional Average Treatment Effects With High-Dimensional Data , 2019, Journal of Business & Economic Statistics.

[20]  Jelena Bradic,et al.  Synthetic learner: model-free inference on treatments over time , 2019, Journal of Econometrics.

[21]  Ricardo P. Masini,et al.  Counterfactual Analysis With Artificial Controls: Inference, High Dimensions and Nonstationarity , 2019 .

[22]  S. Athey,et al.  Estimating Treatment Effects with Causal Forests: An Application , 2019, Observational Studies.

[23]  David A. Hirshberg,et al.  Synthetic Difference in Differences , 2018, 1812.09970.

[24]  Michael Lechner,et al.  Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence , 2018, The Econometrics Journal.

[25]  V. Chernozhukov,et al.  Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India , 2018 .

[26]  Victor Chernozhukov,et al.  An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls , 2017, Journal of the American Statistical Association.

[27]  James B. Brown,et al.  Iterative random forests to discover predictive and stable high-order interactions , 2017, Proceedings of the National Academy of Sciences.

[28]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[29]  Esther Duflo,et al.  The Economist as Plumber , 2017 .

[30]  Carlos Carvalho,et al.  ARCO: An Artificial Counterfactual Approach for High-Dimensional Panel Time-Series Data , 2016, Journal of Econometrics.

[31]  Hal R Varian,et al.  Causal inference in economics and marketing , 2016, Proceedings of the National Academy of Sciences.

[32]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[33]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[34]  Steven L. Scott,et al.  Inferring causal impact using Bayesian structural time-series models , 2015, 1506.00356.

[35]  Elizabeth L. Ogburn,et al.  Causal diagrams for interference , 2014, 1403.1239.

[36]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[37]  Jens Hainmueller,et al.  Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program , 2010 .

[38]  H. White,et al.  Nonparametric Identification in Nonseparable Panel Data Models with Generalized Fixed Effects , 2009 .

[39]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[40]  Michael E. Sobel,et al.  What Do Randomized Studies of Housing Mobility Demonstrate? , 2006 .

[41]  J. Friedman Stochastic gradient boosting , 2002 .

[42]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[43]  E. Duflo,et al.  How Much Should We Trust Differences-in-Differences Estimates? , 2001 .

[44]  Marno Verbeek,et al.  A Guide to Modern Econometrics , 2000 .

[45]  David Card,et al.  Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania , 1993 .

[46]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[47]  David Card The Impact of the Mariel Boatlift on the Miami Labor Market , 1989 .

[48]  E. Carlstein The Use of Subseries Values for Estimating the Variance of a General Statistic from a Stationary Sequence , 1986 .

[49]  P. Holland Statistics and Causal Inference , 1985 .

[50]  Orley Ashenfelter,et al.  Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs , 1984 .

[51]  George E. P. Box,et al.  Intervention Analysis with Applications to Economic and Environmental Problems , 1975 .

[52]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[53]  J. Roth,et al.  A More Credible Approach to Parallel Trends ∗ , 2022 .

[54]  Yiqing Xu Causal Inference with Time-Series Cross-Sectional Data: A Reflection , 2022, SSRN Electronic Journal.

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[57]  David R. Cox Planning of Experiments , 1958 .