Same difference? Understanding variation in the estimation of effect sizes from educational trials

By applying four analytic models with comparable outcomes and covariates to a dataset of 20 outcomes from 17 educational trials, we found results closely matching in well-powered studies without serious implementation problems. The interventions and evaluations were all funded by the Education Endowment Foundation and independently evaluated. We demonstrated that when an analysis takes little account of research design, or where there were difficulties with implementation and data collection, point estimates of effect differ and estimates of precision vary. This adds to the challenge of understanding the comparative impact of interventions and deciding which are worth scaling up.

[1]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[2]  Macartan Humphreys,et al.  Fishing, Commitment, and Communication: A Proposal for Comprehensive Nonbinding Research Registration , 2012, Political Analysis.

[3]  P. Connolly,et al.  TextNow Transition Programme: Evaluation Report and Executive Summary , 2014 .

[4]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[5]  M. Borenstein Effect sizes for continuous data. , 2009 .

[6]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[7]  D. Torgerson,et al.  Designing randomised trials in health, education and the social sciences : an introduction , 2008 .

[8]  Andrew Gelman,et al.  Why We (Usually) Don't Have to Worry About Multiple Comparisons , 2009, 0907.2478.

[9]  Donald B. Rubin,et al.  Comment: The Design and Analysis of Gold Standard Randomized Experiments , 2008 .

[10]  G. Box An Accidental Statistician: The Life and Memories of George E. P. Box , 2013 .

[11]  D. Gianola,et al.  Marginal inferences about variance components in a mixed linear model using Gibbs sampling , 1993, Genetics Selection Evolution.

[12]  Leland Wilkinson Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty Through Graphical Display , 2010 .

[13]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[14]  G. Molenberghs,et al.  Past, Present, and Future of Statistical Science , 2014 .

[15]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[16]  Larry V. Hedges,et al.  What Are Effect Sizes and Why Do We Need Them , 2008 .

[17]  Fresh Start: Evaluation Report and Executive Summary. , 2015 .

[18]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[19]  N. Mitchell,et al.  Grammar for Writing: Evaluation Report and Executive Summary. , 2014 .

[20]  L. Hedges Effect Sizes in Cluster-Randomized Designs , 2007 .

[21]  D. Rubin The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials , 2007, Statistics in medicine.

[22]  S. Gorard,et al.  Accelerated Reader: Evaluation Report and Executive Summary. , 2015 .

[23]  Improving Writing Quality: Evaluation Report and Executive Summary. , 2014 .

[24]  Richard J. Hayes,et al.  Cluster randomised trials , 2009 .

[25]  W G Henderson,et al.  Multisite Randomized Controlled Trials in Health Services Research: Scientific Challenges and Operational Issues , 2001, Medical care.

[26]  Benjamin A. Olken,et al.  Promises and Perils of Pre-analysis Plans , 2015 .

[27]  Harvey Goldstein,et al.  New Statistical Methods for Analysing Social Structures: an introduction to multilevel models , 1991 .

[28]  Past, Present, and Future of Statistical Science , 2015 .

[29]  James L Peugh,et al.  A practical guide to multilevel modeling. , 2010, Journal of school psychology.

[30]  Allison Jennifer Ames,et al.  Accuracy and Precision of an Effect Size and Its Variance From a Multilevel Model for Cluster Randomized Trials: A Simulation Study , 2013, Multivariate behavioral research.

[31]  P. Connolly,et al.  Summer Active Reading Programme : evaluation report and executive summary , 2014 .