Evaluating Ex Ante Counterfactual Predictions Using Ex Post Causal Inference

We derive a formal, decision-based method for comparing the performance of counterfactual treatment regime predictions using the results of experiments that give relevant information on the distribution of treated outcomes. Our approach allows us to quantify and assess the statistical significance of differential performance for optimal treatment regimes estimated from structural models, extrapolated treatment effects, expert opinion, and other methods. We apply our method to evaluate optimal treatment regimes for conditional cash transfer programs across countries where predictions are generated using data from experimental evaluations in other countries and pre-program data in the country of interest.

[1]  A. Garber,et al.  Economic foundations of cost-effectiveness analysis. , 1997, Journal of health economics.

[2]  C. Meghir,et al.  Education Choices in Mexico: Using a Structural Model and a Randomized Experiment to evaluate Progresa.∗ , 2005 .

[3]  Petra E. Todd,et al.  Ex Ante Evaluation of Social Programs , 2006 .

[4]  J. Saavedra,et al.  Educational Impacts and Cost-Effectiveness of Conditional Cash Transfer Programs in Developing Countries: A Meta-Analysis , 2013 .

[5]  David R. Cox Planning of Experiments , 1958 .

[6]  M. Keane,et al.  Exploring the Usefulness of a Nonrandom Holdout Sample for Model Validation: Welfare Effects on Female Behavior , 2007 .

[7]  Stefano DellaVigna,et al.  Predicting Experimental Results: Who Knows What? , 2016, Journal of Political Economy.

[8]  C. Meghir,et al.  Education choices in Mexico: using a structural model and a randomized experiment to evaluate Progresa , 2010 .

[9]  L. Pritchett,et al.  Context Matters for Size: Why External Validity Claims and Development Practice Don't Mix , 2013 .

[10]  Esther Duflo,et al.  Comparative Cost-Effectiveness Analysis to Inform Policy in Developing Countries: A General Framework with Applications for Education , 2013 .

[11]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[12]  Petra E. Todd,et al.  Progressing through PROGRESA: An Impact Assessment of a School Subsidy Experiment in Rural Mexico , 2005, Economic Development and Cultural Change.

[13]  J. Marschak,et al.  ECONOMIC COMPARABILITY OF INFORMATION SYSTEMS. , 1968 .

[14]  Jeffrey S. Racine,et al.  Nonparametric Econometrics: The np Package , 2008 .

[15]  M. Hashem Pesaran,et al.  Decision‐Based Methods for Forecast Evaluation , 2007 .

[16]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[17]  T. Schultz SCHOOL SUBSIDIES FOR THE POOR: EVALUATING THE MEXICAN PROGRESA POVERTY PROGRAM , 2012 .

[18]  Debopam Bhattacharya,et al.  Inferring Welfare Maximizing Treatment Assignment Under Budget Constraints , 2008 .

[19]  Petra E. Todd,et al.  Assessing the Impact of a School Subsidy Program in Mexico: Using a Social Experiment to Validate a Dynamic Behavioral Model of Child Schooling and Fertility. , 2006, The American economic review.

[20]  V. J. Hotz,et al.  Predicting the efficacy of future training programs using past experiences at other locations , 2005 .

[21]  Guido Imbens,et al.  Site Selection Bias in Program Evaluation , 2014 .

[22]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[23]  Gabriel E. Kreindler,et al.  Debunking the Stereotype of the Lazy Welfare Recipient: Evidence from Cash Transfer Programs Worldwide , 2015 .

[24]  C. Manski Choosing Treatment Policies Under Ambiguity , 2011 .

[25]  Ken West Asymptotic Inference about Predictive Ability, An Additional Appendix , 1994 .

[26]  Charles F. Manski,et al.  Learning about Treatment Effects from Experiments with Random Assignment of Treatments , 1996 .

[27]  Rajeev Dehejia,et al.  From Local to Global: External Validity in a Fertility Natural Experiment , 2015, Journal of Business & Economic Statistics.

[28]  Peter Reinhard Hansen,et al.  The Model Confidence Set , 2010 .

[29]  Frank Schorfheide,et al.  On the Use of Holdout Samples for Model Selection , 2012 .

[30]  Jared K Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. , 2017, Statistics in medicine.

[31]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[32]  E. Mammen,et al.  Nonparametric regression with nonparametrically generated covariates , 2012, 1207.5594.

[33]  A. Janvry,et al.  Making Conditional Cash Transfer Programs More Efficient: Designing for Maximum Effect of the Conditionality , 2006 .

[34]  E. Oster,et al.  Weighting for External Validity , 2017 .

[35]  Stephen P. Ryan,et al.  Incentives Work: Getting Teachers to Come to School , 2012 .

[36]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[37]  Rajeev Dehejia,et al.  Was There a Riverside Miracle? A Hierarchical Framework for Evaluating Programs With Grouped Data , 2003 .

[38]  Rachael Meager,et al.  Understanding the Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of 7 Randomised Experiments , 2015, 1506.06669.

[39]  M. Hudgens,et al.  Toward Causal Inference With Interference , 2008, Journal of the American Statistical Association.

[40]  Frank Schorfheide,et al.  To Hold Out or Not to Hold Out , 2013 .

[41]  Petra E. Todd,et al.  International Food Policy Research Institute Randomness in the Experimental Samples of Progresa (education, Health, and Nutrition Program) , 2001 .

[42]  K. Imai,et al.  Estimation of Heterogeneous Treatment Effects from Randomized Experiments, with Application to the Optimal Planning of the Get-Out-the-Vote Campaign , 2011, Political Analysis.

[43]  Francis X. Diebold,et al.  Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests , 2012 .

[44]  H. Patrinos,et al.  Comparable Estimates of Returns to Schooling Around the World , 2014 .

[45]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[46]  Petra E. Todd,et al.  The Structural Estimation of Behavioral Models: Discrete Choice Dynamic Programming Methods and Applications , 2011 .

[47]  Rajeev Dehejia When is ATE enough? Risk aversion and inequality aversion in evaluating training programs , 2008 .

[48]  Peter M. Aronow,et al.  Estimating Average Causal Effects Under Interference Between Units , 2013, 1305.6156.

[49]  Kenneth I. Wolpin,et al.  Ex Ante Policy Evaluation, Structural Estimation and Model Selection , 2007 .

[50]  C. Manski Statistical treatment rules for heterogeneous populations , 2003 .

[51]  C. Granger,et al.  Forecasting and Decision Theory , 2006 .

[52]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[53]  M. Keane,et al.  Exploring the Usefulness of a Non-Random Holdout Sample for Model Validation: Welfare Effects on Female Behavior , 2005 .

[54]  A. Tetenov Statistical treatment choice based on asymmetric minimax regret criteria , 2009 .

[55]  S. Parker,et al.  Do Conditional Cash Transfers Improve Economic Outcomes in the Next Generation? Evidence from Mexico , 2018, The Economic Journal.

[56]  Toru Kitagawa,et al.  Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .

[57]  David A. Lane Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[58]  H. White,et al.  A Reality Check for Data Snooping , 2000 .

[59]  K. Hirano Statistical Decision Rules in Econometrics* , 2020 .

[60]  K. Hirano,et al.  Asymptotics for Statistical Treatment Rules , 2009 .

[61]  Abhijit Banerjee,et al.  Decision Theoretic Approaches to Experiment Design and External Validity , 2016 .

[62]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[63]  Rajeev Dehejia,et al.  Program Evaluation as a Decision Problem , 1999 .