Diagnosing bottlenecks in development-stage field experiments: Troubleshooting and finding opportunities for improvement

Randomized trials of social programs yield internally valid estimates of causal impacts on key outcomes. While estimates of gross impact serve as useful summaries, program developers require deeper information to drive improvement efforts, especially when no impact is observed. The first main goal of this work is to present a seven-stage diagnostic method for assessing process bottlenecks in experiments. Designed for programs still in development, the troubleshooting sequence uses mixed methods to assess where in a program’s logic model the process is compromised. It includes post-experimental methods that are built into the design, to account for impact variation and test where effects are intensifying or diminishing. The second main purpose of this work is to demonstrate one such method in detail. The approach tests the relationship between fidelity of program implementation and impact. First, levels of achieved fidelity in the treatment group are modeled in terms of informative baseline covariates. The model is then used to index fidelity in both conditions. Informed only by pre-randomization characteristics of individuals, the model-based fidelity scores are unbiased by endogeneity, and allow assessment of whether impacts on key outcomes vary by levels of fidelity. Results can help program developers focus improvement efforts. We illustrate the seven-step diagnostic process through a randomized trial of the Internet-Based Reading Apprenticeship Improving Science Education (iRAISE) program. Eighty-two high school science teachers and 1468 students were randomly assigned to a literacy program or control. There was no overall impact on achievement. Applying the diagnostic process revealed this was not due to a weak program contrast between conditions, or an inadequate assessment; rather, lower-than-expected impact was likely due to weaker than intended implementation.

[1]  Breaking the “Adopt, Attack, Abandon” Cycle: A Case for Improvement Science in K–12 Education , 2017 .

[2]  John Sabatini,et al.  Designing Reading Comprehension Assessments for Reading Interventions: How a Theoretically Motivated Assessment Can Serve as an Outcome Measure , 2014 .

[3]  Terry S. Salinger,et al.  The Enhanced Reading Opportunities Study: Early Impact and Implementation Findings. NCEE 2008-4015. , 2008 .

[4]  Susan A. Murty,et al.  Logic Modeling: A Tool for Teaching Practice Evaluation. , 1997 .

[5]  D. A. Kenny,et al.  The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. , 1986, Journal of personality and social psychology.

[6]  F. Mosteller,et al.  Evidence matters : randomized trials in education research , 2002 .

[7]  Peter Z. Schochet Do Typical RCTs of Education Interventions Have Sufficient Statistical Power for Linking Impacts on Teacher Practice and Student Achievement Outcomes? , 2009 .

[8]  R. Hurtig,et al.  Linking Implementation Fidelity to Impacts in an RCT. , 2013 .

[9]  John A. McLaughlin,et al.  Logic models: a tool for telling your programs performance story , 1999 .

[10]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[11]  Diana Epstein,et al.  On the “When” of Social Experiments: The Tension Between Program Refinement and Abandonment , 2016 .

[12]  Andrew P. Jaciw,et al.  An Empirical Study of Design Parameters for Assessing Differential Impacts for Students in Group Randomized Trials , 2014, Evaluation review.

[13]  Howard S. Bloom,et al.  Constructing Instrumental Variables from Experimental Data to Explore How Treatments Produce Effects. , 2005 .

[14]  Andrew P. Jaciw,et al.  Evaluation of the Effectiveness of the Alabama Math, Science, and Technology Initiative (AMSTI) , 2012 .

[15]  Elizabeth Tipton Improving Generalizations From Experiments Using Propensity Score Subclassification , 2013 .

[16]  P. Holland Statistics and Causal Inference , 1985 .

[17]  Jessaca K. Spybrook Detecting Intervention Effects Across Context: An Examination of the Precision of Cluster Randomized Trials , 2014 .

[18]  Mario Hernandez,et al.  Using Logic Models and Program Theory To Build Outcome Accountability. , 2000 .

[19]  An Empirical Study of Design Parameters for Assessing Differential Impacts for Students in Group Randomized Trials , 2014 .

[20]  David S. Cordray,et al.  Moving From the Lab to the Field: The Role of Fidelity and Achieved Relative Intervention Strength , 2009 .

[21]  John Burghardt,et al.  Using Propensity Scoring to Estimate Program-Related Subgroup Impacts in Experimental Program Evaluations , 2007, Evaluation review.

[22]  Anthony S. Bryk,et al.  2014 AERA Distinguished Lecture , 2015 .

[23]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[24]  Lawrence M. Mead,et al.  On the “How” of Social Experiments: Using Implementation Research to Get Inside the Black Box , 2016 .

[25]  Christina A. Christie,et al.  The Methods and Tools of Improvement Science , 2017 .

[26]  D. Berliner Comment: Educational Research:The Hardest Science of All , 2002 .

[27]  Laura R. Peck,et al.  Editor's Notes Social Experiments in Practice: Introduction, Framing, and Context , 2016 .

[28]  Lisa Wyatt Knowlon,et al.  The Logic Model Guidebook: Better Strategies for Great Results , 2008 .

[29]  Madhabi Chatterji Grades of Evidence , 2007 .

[30]  L. Cronbach Beyond the Two Disciplines of Scientific Psychology. , 1975 .

[31]  L. Cronbach,et al.  Designing evaluations of educational and social programs , 1983 .

[32]  Ewout W Steyerberg,et al.  The number of subjects per variable required in linear regression analyses. , 2015, Journal of clinical epidemiology.

[33]  Huey-tsyh Chen Theory-driven evaluations , 1990 .

[34]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[35]  Laura R. Peck Subgroup Analysis in Social Experiments: Measuring Program Impacts Based on Post-Treatment Choice , 2003 .

[36]  Andrew P. Jaciw,et al.  The Impact of the Reading Apprenticeship Improving Secondary Education (RAISE) Project on Academic Literacy in High School: A Report of a Randomized Experiment in Pennsylvania and California Schools. Research Report. , 2015 .

[37]  Thomas M. Haladyna,et al.  An Evaluation of Conjunctive and Compensatory Standard-Setting Strategies for Test Decisions , 1999 .

[38]  Thomas D. Cook,et al.  Objecting to the objections to using random assignment in educational research , 2001 .

[39]  D. Mackinnon,et al.  Multilevel Modeling of Individual and Group Level Mediated Effects , 2001, Multivariate behavioral research.