External Validity and Partner Selection Bias

Program evaluation often involves generalizing internally-valid site-speci…c estimates to a dierent population or environment. While there is substantial evidence on the internal valid- ity of non-experimental relative to experimental estimates (e.g. Lalonde 1986), there is little quantitative evidence on the external validity of site-speci…c estimates, because identical treat- ments are rarely evaluated in multiple settings. This paper examines a remarkable series of 14 energy conservation …eld experiments run by a company called OPOWER, involving 550,000 households in dierent cities across the U.S. Despite the availability of potentially-promising individual-level controls, we show that the unexplained variation in treatment eects across sites is both statistically and economically signi…cant. Furthermore, we show that the electric utilities that partner with OPOWER dier systematically on characteristics that are correlated with the treatment eect, providing evidence of a "partner selection bias" that is analogous to biases caused by individual-level selection into treatment. We augment this result in a dierent context by showing that partner micro…nancial institutions (MFIs) that carry out randomized experiments appear to be selected on observable characteristics from the global pool of MFIs. Finally, we propose a statistical test for parameter heterogeneity at "sub-sites" within a site that provides suggestive evidence on whether site-speci…c estimates can be generalized.

[1]  Edward Miguel,et al.  Worms: Identifying Impacts on Education and Health in the Presence of Treatment Externalities, Data User's Guide , 2014 .

[2]  Esther Duflo,et al.  Do Labor Market Policies Have Displacement Effects? Evidence from a Clustered Randomized Experiment , 2012 .

[3]  H. Allcott,et al.  Is There an Energy Efficiency Gap? , 2012 .

[4]  C. S. Reichardt,et al.  Regression-discontinuity designs. , 2012 .

[5]  H. Allcott,et al.  Social Norms and Energy Conservation , 2011 .

[6]  Christopher R. Walters,et al.  Explaining Charter School Effectiveness. NBER Working Paper No. 17332. , 2011 .

[7]  Jeffrey R. Kling,et al.  Mechanism Experiments and Policy Evaluations , 2011 .

[8]  J. Angrist,et al.  Extrapolate-Ing: External Validity and Overidentification in the Late Framework , 2010 .

[9]  C. Manski Policy Analysis with Incredible Certitude , 2010 .

[10]  Angus Deaton Instruments, Randomization, and Learning about Development , 2010 .

[11]  Nancy Cartwright,et al.  Hunting causes and using them: approaches in philosophy and economics: summary , 2010 .

[12]  Angus Deaton,et al.  Understanding the Mechanisms of Economic Development , 2010 .

[13]  Matthew E. Kahn,et al.  Energy Conservation "Nudges" and Environmentalist Ideology: Evidence from a Randomized Residential Electricity Field Experiment , 2010 .

[14]  S. Mullainathan,et al.  Behavior and Energy Policy , 2010, Science.

[15]  M. Whinston,et al.  Taking the Dogma out of Econometrics: Structural Modeling and Credible Inference , 2010 .

[16]  Joshua D. Angrist,et al.  The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con Out of Econometrics , 2010, SSRN Electronic Journal.

[17]  Jessica Cohen,et al.  What Works in Development?: Thinking Big and Thinking Small , 2010 .

[18]  James J Heckman,et al.  Comparing IV with Structural Models: What Simple IV Can and Cannot Identify , 2009, Journal of econometrics.

[19]  Victor Lavy,et al.  Multiple Experiments for the Causal Link between the Quantity and Quality of Children , 2006, Journal of Labor Economics.

[20]  A. Banerjee,et al.  The Miracle of Microfinance? Evidence from a Randomized Evaluation , 2013 .

[21]  Parag A. Pathak,et al.  Accountability and Flexibility in Public Schools: Evidence from Boston's Charters and Pilots , 2009 .

[22]  I. Ayres,et al.  Evidence from Two Large Field Experiments that Peer Comparison Feedback Can Reduce Residential Energy Usage , 2009 .

[23]  G. Imbens,et al.  Better Late than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009) , 2009 .

[24]  David S. Lee,et al.  Regression Discontinuity Designs in Economics , 2009 .

[25]  Sendhil Mullainathan,et al.  What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment , 2009 .

[26]  Alaka Holla,et al.  Pricing and Access : Lessons from Randomized Evaluations in Education and Health , 2008 .

[27]  Dani Rodrik,et al.  The New Development Economics: We Shall Experiment, but How Shall We Learn? , 2008 .

[28]  P. Reiss,et al.  What changes energy consumption? Prices and public pressures , 2008 .

[29]  Steven D. Levitt,et al.  FIELD EXPERIMENTS IN ECONOMICS : THE PAST , THE PRESENT , AND THE FUTURE , 2008 .

[30]  Noah J. Goldstein,et al.  Normative Social Influence is Underdetected , 2008, Personality & social psychology bulletin.

[31]  Lucas W. Davis Durable goods and residential demand for energy and water: evidence from a field trial , 2008 .

[32]  J. Worrall Evidence in Medicine and Evidence‐Based Medicine , 2007 .

[33]  Noah J. Goldstein,et al.  The Constructive, Destructive, and Reconstructive Power of Social Norms , 2007, Psychological science.

[34]  Steven D. Levitt,et al.  What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World , 2007 .

[35]  Nancy Cartwright,et al.  Are RCTs the Gold Standard? , 2007 .

[36]  James J. Heckman,et al.  Econometric Evaluation of Social Programs, Part I: Causal Models, Structural Models and Econometric Policy Evaluation , 2007 .

[37]  James J. Heckman,et al.  Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to Forecast their Effects in New Environments , 2007 .

[38]  Petra E. Todd,et al.  Assessing the Impact of a School Subsidy Program in Mexico: Using a Social Experiment to Validate a Dynamic Behavioral Model of Child Schooling and Fertility. , 2006, The American economic review.

[39]  James J Heckman,et al.  Understanding Instrumental Variables in Models with Essential Heterogeneity , 2006, The Review of Economics and Statistics.

[40]  Jacob Alex Klerman,et al.  NBER WORKING PAPER SERIES EVALUATING THE DIFFERENTIAL EFFECTS OF ALTERNATIVE WELFARE-TO-WORK TRAINING COMPONENTS: A RE-ANALYSIS OF THE CALIFORNIA GAIN PROGRAM , 2022 .

[41]  Esther Duflo,et al.  Monitoring Works : Getting Teachers to Come to School ∗ , 2007 .

[42]  V. J. Hotz,et al.  Predicting the efficacy of future training programs using past experiences at other locations , 2005 .

[43]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[44]  Christopher R. Taber,et al.  Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools , 2000, Journal of Political Economy.

[45]  Dean S. Karlan,et al.  Observing Unobservables: Identifying Information Asymmetries with a Consumer Credit Field Experiment , 2005 .

[46]  Esther Duflo,et al.  WOMEN AS POLICY MAKERS: EVIDENCE FROM A RANDOMIZED POLICY EXPERIMENT IN INDIA , 2004 .

[47]  D. Greenberg Digest of Social Experiments, Third Edition , 2004 .

[48]  Accounting for Limited Overlap in Estimation of Average Treatment Eects under Unconfoundedness , 2004 .

[49]  Charu Sharma,et al.  Iron deficiency anemia and school participation , 2004 .

[50]  Esther Duflo,et al.  Scaling Up and Evaluation , 2003 .

[51]  C. Meghir,et al.  Using randomized experiments and structural models for 'scaling up': evidence from the PROGRESA evaluation , 2003 .

[52]  Rajeev Dehejia,et al.  Was There a Riverside Miracle? A Hierarchical Framework for Evaluating Programs With Grouped Data , 2003 .

[53]  J. Heckman,et al.  STRUCTURAL EQUATIONS, TREATMENT EFFECTS AND ECONOMETRIC POLICY , 2003 .

[54]  Robin Jacob,et al.  Moving to Opportunity for Fair Housing Demonstration Program , 2003 .

[55]  Lant Pritchett,et al.  It pays to be ignorant: A simple political economy of rigorous program evaluation , 2002 .

[56]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[57]  J. Heckman,et al.  Policy-Relevant Treatment Effects , 2001 .

[58]  Jeffrey A. Smith,et al.  Does Matching Overcome Lalonde's Critique of Nonexperimental Estimators? , 2000 .

[59]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[60]  J. Heckman,et al.  The Economics and Econometrics of Active Labor Market Programs , 1999 .

[61]  James J. Heckman,et al.  Characterizing Selection Bias Using Experimental Data , 1998 .

[62]  Petra E. Todd,et al.  Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme , 1997 .

[63]  Jeffrey A. Smith,et al.  The Sensitivity of Experimental Impact Estimates: Evidence from the National Jtpa Study , 1997 .

[64]  James J. Heckman,et al.  Assessing the Case for Social Experiments , 1995 .

[65]  Bruce D. Meyer Lessons from the U.S. Unemployment Insurance Experiments , 1995 .

[66]  Charles F. Manski,et al.  Evaluating Welfare and Training Programs. , 1994 .

[67]  Howard S. Bloom,et al.  The National JTPA Study: Title II-A Impacts on Earnings and Employment at 18 Months. Executive Summary. , 1992 .

[68]  V. Joseph Hotz,et al.  Designing Experimental Evaluations of Social Programs: The Case of the U.S. National JTPA Study , 1992 .

[69]  James J. Heckman,et al.  Randomization and Social Policy Evaluation , 1991 .

[70]  H. James VARIETIES OF SELECTION BIAS , 1990 .

[71]  Kevin M. Murphy,et al.  Estimation and Inference in Two-Step Econometric Models , 1985 .

[72]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[73]  Dennis J. Aigner,et al.  The welfare econometrics of peak-load pricing for electricity: Editor's Introduction , 1984 .

[74]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[75]  D. Meza Health insurance and the demand for medical care , 1983 .

[76]  J. Heckman Sample selection bias as a specification error , 1979 .

[77]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[78]  D. Campbell,et al.  EXPERIMENTAL AND QUASI-EXPERIMENT Al DESIGNS FOR RESEARCH , 2012 .

[79]  D. Campbell Factors relevant to the validity of experiments in social settings. , 1957, Psychological bulletin.

[80]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .