Leveraging population outcomes to improve the generalization of experimental results: Application to the JTPA study

Generalizing causal estimates in randomized experiments to a broader target population is essential for guiding decisions by policymakers and practitioners in the social and biomedical sciences. While recent papers developed various weighting estimators for the population average treatment effect (PATE), many of these methods result in large variance because the experimental sample often differs substantially from the target population, and estimated sampling weights are extreme. We investigate this practical problem motivated by an evaluation study of the Job Training Partnership Act (JTPA), where we examine how well we can generalize the causal effect of job training programs beyond a specific population of economically disadvantaged adults and youths. In particular, we propose post-residualized weighting in which we use the outcome measured in the observational population data to build a flexible predictive model (e.g., machine learning methods) and residualize the outcome in the experimental data before using conventional weighting methods. We show that the proposed PATE estimator is consistent under the same assumptions required for existing weighting methods, impor-tantly without assuming the correct specification of the predictive model. We demonstrate the efficiency gains from this approach through our JTPA application: we find a between 5 and 25% reduction in variance.

[1]  G. Varoquaux,et al.  Causal inference methods for combining randomized trials and observational studies: a review , 2020, 2011.08047.

[2]  D. Jacob Cross-Fitting and Averaging for Machine Learning Estimation of Heterogeneous Treatment Effects , 2020, 2007.02852.

[3]  E. Hartman,et al.  Elements of External Validity: Framework, Design, and Analysis , 2020, American Political Science Review.

[4]  Raj Chetty,et al.  Combining Experimental and Observational Data to Estimate Treatment Effects on Long Term Outcomes , 2020, 2006.09676.

[5]  Nathan Kallus,et al.  On the role of surrogates in the efficient estimation of treatment effects with limited outcome data , 2020, ArXiv.

[6]  G. Imbens,et al.  The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely , 2019 .

[7]  Erin Hartman,et al.  Covariate selection for generalizing experimental results: Application to a large‐scale development program in Uganda * , 2019, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[8]  P. Ding,et al.  Two seemingly paradoxical results in linear models: the variance inflation factor and the analysis of covariance , 2019, 1903.03883.

[9]  Ashley L. Buchanan,et al.  Generalizing evidence from randomized trials using inverse probability of sampling weights , 2018, Journal of the Royal Statistical Society. Series A,.

[10]  V. Chernozhukov,et al.  Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India , 2018 .

[11]  Sarah E. Robertson,et al.  Generalizing causal inferences from individuals in randomized trials to all trial‐eligible individuals , 2017, Biometrics.

[12]  E. Stuart,et al.  Generalizing Treatment Effect Estimates From Sample to Population: A Case Study in the Difficulties of Finding Sufficient Data , 2017, Evaluation review.

[13]  Delia Baldassarri,et al.  Field experiments across the social sciences , 2017 .

[14]  Jasjeet S. Sekhon,et al.  Worth Weighting? How to Think About and Use Weights in Survey Experiments , 2017, Political Analysis.

[15]  S. Cole,et al.  Correction: Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects , 2016, The Annals of Applied Statistics.

[16]  A. Deaton,et al.  Understanding and Misunderstanding Randomized Controlled Trials , 2016, Social science & medicine.

[17]  J. Pearl,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[18]  Jennifer Hill,et al.  Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations , 2016, Journal of research on educational effectiveness.

[19]  Roland R. Ramsahai,et al.  From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects , 2015 .

[20]  Adam C Sales,et al.  Rebar: Reinforcing a Matching Estimator With Predictions From High-Dimensional Covariates , 2015, 1505.04697.

[21]  Elias Bareinboim,et al.  External Validity: From Do-Calculus to Transportability Across Populations , 2014, Probabilistic and Causal Inference.

[22]  L. Hedges,et al.  Generalizing from unrepresentative experiments: a stratified propensity score approach , 2014 .

[23]  Elizabeth Tipton Improving Generalizations From Experiments Using Propensity Score Subclassification , 2013 .

[24]  W. Lin,et al.  Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique , 2012, 1208.2301.

[25]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[26]  S. Cole,et al.  Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. , 2010, American journal of epidemiology.

[27]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[28]  J. Heckman,et al.  Lab Experiments Are a Major Source of Knowledge in the Social Sciences , 2009, Science.

[29]  Abhijit Banerjee,et al.  The Experimental Approach to Development Economics , 2008 .

[30]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[31]  D. Freedman On regression adjustments in experiments with several treatments , 2008, 0803.3757.

[32]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2973.

[33]  P. Rosenbaum Covariance Adjustment in Randomized Experiments and Observational Studies , 2002 .

[34]  D. Green,et al.  The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment , 2000, American Political Science Review.

[35]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[36]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[37]  Howard S. Bloom,et al.  The National JTPA Study: Title II-A Impacts on Earnings and Employment at 18 Months. Executive Summary. , 1992 .

[38]  Elizabeth R. Word The State of Tennessee's Student/Teacher Achievement Ratio (STAR) Project: Technical Report (1985-1990). , 1990 .

[39]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[40]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[41]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[42]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[43]  S. Stringhini,et al.  AMERICAN JOURNAL OF EPIDEMIOLOGY , 1965 .

[44]  J. Norrie,et al.  Pragmatic Trials. , 2016, The New England journal of medicine.

[45]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[46]  Mark J. van der Laan,et al.  Super Learner In Prediction , 2010 .

[47]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[48]  L. Breiman Random Forests , 2001, Machine Learning.