SITE SELECTION BIAS IN PROGRAM EVALUATION Site Selection Bias in Program Evaluation

This paper is a substantial revision of a manuscript titled "External Validity and Partner Selection Bias" on which Sendhil Mullainathan was a co-author. Although he is no longer a co-author, this project has benefited enormously from his insights. I thank Josh Angrist, Amitabh Chandra, Lucas Davis, Kyle Dropp, Meredith Fowlie, Xavier Gine, Chuck Goldman, Matt Harding, Joe Hotz, Guido Imbens, Larry Katz, Chris Knittel, Dan Levy, Jens Ludwig, Konrad Menzel, Emily Oster, Rohini Pande, Todd Rogers, Piyush Tantia, Ed Vytlacil, Heidi Williams, and seminar participants at the ASSA meetings, Berkeley, Columbia, Harvard, MIT, NBER Labor Studies, NBER Energy and Environmental Economics, NEUDC, the UCSB/UCLA Conference on Field Experiments, and the World Bank for insights and helpful advice. Thanks also to Tyler Curtis, Marc Laitin, Alex Laskey, Alessandro Orfei, Nate Srinivas, Dan Yates, and others at Opower for fruitful discussions. Christina Larkin provided timely research assistance. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research. NBER working papers are circulated for discussion and comment purposes. They have not been peer-reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications. ABSTRACT “Site selection bias” occurs when the probability that partners adopt or evaluate a program is correlated with treatment effects. I test for site selection bias in the context of the Opower energy conservation programs, using 111 randomized control trials (RCTs) involving 8.6 million households across the United States. Predictions based on rich microdata from the first ten replications substantially overstate efficacy in the next 101 sites. There is evidence of two positive selection mechanisms. First, local populations with stronger preferences for environmental conservation both encourage utilities to adopt the program and are more responsive to the treatment. Second, program managers initially target treatment at the most responsive consumer sub-populations, meaning that efficacy drops when utilities expand the program. While it may be optimal to initially target an intervention toward the most responsive populations, these results show how analysts can be systematically biased when extrapolating experimental results, even after many replications. I augment the Opower results by showing that microfinance institutions (MFIs) that run RCTs differ from the global population of MFIs and that hospitals that host clinical trials differ from the national population of hospitals.

[1]  Edward Miguel,et al.  Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities, Guide to Replication of Miguel and Kremer (2004) , 2014 .

[2]  L. Pritchett,et al.  Context Matters for Size: Why External Validity Claims and Development Practice Don't Mix , 2013 .

[3]  A. Banerjee,et al.  The Miracle of Microfinance? Evidence from a Randomized Evaluation , 2013 .

[4]  G. Mwabu,et al.  Scaling Up What Works: Experimental Evidence on External Validity in Kenyan Education , 2013 .

[5]  Esther Duflo,et al.  Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment , 2012 .

[6]  H. Allcott,et al.  The Short-Run and Long-Run Effects of Behavioral Interventions: Experimental Evidence from Energy Conservation , 2012 .

[7]  H. Allcott,et al.  Is There an Energy Efficiency Gap? , 2012 .

[8]  D. Revel Cost-effectiveness of electricity energy efficiency programs , 2011 .

[9]  H. Allcott,et al.  Social Norms and Energy Conservation , 2011 .

[10]  Parag A. Pathak,et al.  Explaining Charter School Effectiveness , 2011, SSRN Electronic Journal.

[11]  Jeffrey R. Kling,et al.  Mechanism Experiments and Policy Evaluations , 2011 .

[12]  E. Tamer,et al.  Using Observational vs. Randomized Controlled Trial Data to Learn About Treatment Effects , 2011 .

[13]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[14]  J. Angrist,et al.  Extrapolate-Ing: External Validity and Overidentification in the Late Framework , 2010 .

[15]  Erik Snowberg,et al.  Selective Trials: A Principal-Agent Approach to Randomized Controlled Experiments , 2010 .

[16]  C. Manski Policy Analysis with Incredible Certitude , 2010 .

[17]  Angus Deaton Instruments, Randomization, and Learning about Development , 2010 .

[18]  Angus Deaton,et al.  Understanding the Mechanisms of Economic Development , 2010 .

[19]  Matthew E. Kahn,et al.  Energy Conservation "Nudges" and Environmentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment , 2010 .

[20]  S. Mullainathan,et al.  Behavior and Energy Policy , 2010, Science.

[21]  Joshua D. Angrist,et al.  The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con Out of Econometrics , 2010, SSRN Electronic Journal.

[22]  Parag A. Pathak,et al.  Accountability and Flexibility in Public Schools: Evidence from Boston's Charters and Pilots , 2009 .

[23]  Dean S. Karlan,et al.  Group Versus Individual Liability: Long Term Evidence from Philippine Microcredit Lending Groups , 2009 .

[24]  G. Imbens,et al.  Better Late than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009) , 2009 .

[25]  David Card,et al.  Active Labor Market Policy Evaluations: A Meta-Analysis , 2009, SSRN Electronic Journal.

[26]  David Lee,et al.  Regression Discontinuity Designs in Economics , 2009 .

[27]  Sendhil Mullainathan,et al.  What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment , 2009 .

[28]  Dani Rodrik,et al.  The New Development Economics: We Shall Experiment, but How Shall We Learn? , 2008 .

[29]  S. Levitt,et al.  Nber Working Paper Series Field Experiments in Economics: the Past, the Present, and the Future , 2022 .

[30]  Nancy Cartwright,et al.  Are RCTs the Gold Standard? , 2007 .

[31]  F. Van de Werf,et al.  External validity of clinical trials in acute myocardial infarction. , 2007, Archives of internal medicine.

[32]  Victor Lavy,et al.  Multiple Experiments for the Causal Link between the Quantity and Quality of Children , 2006, Journal of Labor Economics.

[33]  Dean S. Karlan,et al.  Observing Unobservables: Identifying Information Asymmetries with a Consumer Credit Field Experiment , 2005 .

[34]  P. Rothwell External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[35]  Esther Duflo,et al.  WOMEN AS POLICY MAKERS: EVIDENCE FROM A RANDOMIZED POLICY EXPERIMENT IN INDIA , 2004 .

[36]  David H. Greenberg,et al.  Digest of Social Experiments , 2004 .

[37]  D. Greenberg Digest of Social Experiments, Third Edition , 2004 .

[38]  Esther Duflo,et al.  Scaling Up and Evaluation , 2003 .

[39]  Joshua D. Angrist,et al.  Treatment Effect Heterogeneity in Theory and Practice , 2003 .

[40]  Rajeev Dehejia,et al.  Was There a Riverside Miracle? A Hierarchical Framework for Evaluating Programs With Grouped Data , 2003 .

[41]  Lant Pritchett,et al.  It pays to be ignorant: A simple political economy of rigorous program evaluation , 2002 .

[42]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[43]  E. Duflo,et al.  How Much Should We Trust Differences-in-Differences Estimates? , 2001 .

[44]  J. Heckman,et al.  Policy-Relevant Treatment Effects , 2001 .

[45]  Christopher R. Taber,et al.  Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools , 2000, Journal of Political Economy.

[46]  James J. Heckman,et al.  Characterizing Selection Bias Using Experimental Data , 1998 .

[47]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[48]  Jeffrey A. Smith,et al.  The Sensitivity of Experimental Impact Estimates: Evidence from the National Jtpa Study , 1997 .

[49]  James J. Heckman,et al.  Assessing the Case for Social Experiments , 1995 .

[50]  Thomas Lemieux,et al.  Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach , 1995 .

[51]  V. Joseph Hotz,et al.  Designing Experimental Evaluations of Social Programs: The Case of the U.S. National JTPA Study , 1992 .

[52]  Howard S. Bloom,et al.  The National JTPA Study: Title II-A Impacts on Earnings and Employment at 18 Months. Executive Summary. , 1992 .

[53]  James J. Heckman,et al.  Randomization and Social Policy Evaluation , 1991 .

[54]  F. Doolittle,et al.  Implementing the National JTPA Study. , 1990 .

[55]  A. Donabedian,et al.  The quality of care. How can it be assessed? , 1988, JAMA.

[56]  Kevin M. Murphy,et al.  Estimation and Inference in Two-Step Econometric Models , 1985 .

[57]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[58]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[59]  D. Campbell Factors relevant to the validity of experiments in social settings. , 1957, Psychological bulletin.

[60]  Dan York,et al.  SAVING ENERGY COST-EFFECTIVELY: A NATIONAL REVIEW OF THE COST OF ENERGY SAVED THROUGH UTILITy-SECTOR ENERGY EFFICIENCY PROGRAMS , 2014 .

[61]  J. Sekhon,et al.  From SATE to PATT : Combining Experimental with Observational Studies to Estimate Population Treatment Effects ∗ , 2013 .

[62]  Jacob N. Shapiro,et al.  Where Policy Experiments are Conducted in Economics and Political Science : The Missing Autocracies ∗ , 2013 .

[63]  M. Belot SIRE-DP-2013-112 Partner Selection into Policy Relevant Field Experiments , 2013 .

[64]  Kira Ashby,et al.  Green with Envy: Neighbor Comparisons and Social Norms in Five Home Energy Report Programs , 2012 .

[65]  I. Ayres,et al.  including © notice, is given to the source. Evidence from Two Large Field Experiments that Peer Comparison Feedback Can Reduce Residential Energy Usage , 2009 .

[66]  Pieter A. Gautier,et al.  Journal of Applied Econometrics Selection in a Field Experiment with Voluntary Participation , 2009 .

[67]  M. Kremer,et al.  USING RANDOMIZATION IN DEVELOPMENT ECONOMICS RESEARCH: A TOOLKIT , 2008 .

[68]  James J. Heckman,et al.  Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation , 2007 .

[69]  James J. Heckman,et al.  Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New Environments , 2007 .

[70]  J. Dinardo,et al.  Constructive Proposals for Dealing with Attrition: An Empirical Example∗ , 2006 .

[71]  C. Hoxby,et al.  The Impact of Charter Schools on Student Achievement , 2005 .

[72]  Charu Sharma,et al.  Iron deficiency anemia and school participation , 2004 .

[73]  Robin Jacob,et al.  Moving to Opportunity for Fair Housing Demonstration Program , 2003 .

[74]  C. Gross,et al.  Reporting the Recruitment Process in Clinical Trials: Who Are These Patients and How Did They Get There? , 2002, Annals of Internal Medicine.

[75]  J. Heckman,et al.  The Economics and Econometrics of Active Labor Market Programs , 1999 .

[76]  Charles F. Manski,et al.  Learning about Treatment Effects from Experiments with Random Assignment of Treatments , 1996 .

[77]  Bruce D. Meyer Lessons from the U.S. Unemployment Insurance Experiments , 1995 .

[78]  D. de Meza,et al.  Health insurance and the demand for medical care. , 1983, Journal of health economics.

[79]  J. Heckman Sample selection bias as a specification error , 1979 .

[80]  G. Imbens,et al.  Predicting the Efficacy of Future Training Programs Using past Experiences at Other Locations Article in Press , 2022 .

[81]  Jacob Alex Klerman,et al.  NBER WORKING PAPER SERIES EVALUATING THE DIFFERENTIAL EFFECTS OF ALTERNATIVE WELFARE-TO-WORK TRAINING COMPONENTS: A RE-ANALYSIS OF THE CALIFORNIA GAIN PROGRAM , 2022 .