Assessing the Generalizability of Randomized Trial Results to Target Populations

Recent years have seen increasing interest in and attention to evidence-based practices, where the “evidence” generally comes from well-conducted randomized trials. However, while those trials yield accurate estimates of the effect of the intervention for the participants in the trial (known as “internal validity”), they do not always yield relevant information about the effects in a particular target population (known as “external validity”). This may be due to a lack of specification of a target population when designing the trial, difficulties recruiting a sample that is representative of a prespecified target population, or to interest in considering a target population somewhat different from the population directly targeted by the trial. This paper first provides an overview of existing design and analysis methods for assessing and enhancing the ability of a randomized trial to estimate treatment effects in a target population. It then provides a case study using one particular method, which weights the subjects in a randomized trial to match the population on a set of observed characteristics. The case study uses data from a randomized trial of school-wide positive behavioral interventions and supports (PBIS); our interest is in generalizing the results to the state of Maryland. In the case of PBIS, after weighting, estimated effects in the target population were similar to those observed in the randomized trial. The paper illustrates that statistical methods can be used to assess and enhance the external validity of randomized trials, making the results more applicable to policy and clinical questions. However, there are also many open research questions; future research should focus on questions of treatment effect heterogeneity and further developing these methods for enhancing external validity. Researchers should think carefully about the external validity of randomized trials and be cautious about extrapolating results to specific populations unless they are confident of the similarity between the trial sample and that target population.

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  Robert F. Boruch,et al.  Standards of Evidence: Criteria for Efficacy, Effectiveness and Dissemination , 2005, Prevention Science.

[3]  L. Hedges,et al.  Statistical Methods for Meta-Analysis , 1987 .

[4]  Wei Wang,et al.  Examining How Context Changes Intervention Impact: The Use of Effect Sizes in Multilevel Mixture Meta-Analysis. , 2008, Child development perspectives.

[5]  Russell E. Glasgow,et al.  Evaluating the Relevance, Generalization, and Applicability of Research , 2006, Evaluation & the health professions.

[6]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[7]  Elizabeth Tipton,et al.  Sample Selection in Randomized Experiments: A New Method Using Propensity Score Stratified Sampling , 2014 .

[8]  B. Hansen The prognostic analogue of the propensity score , 2008 .

[9]  David J Spiegelhalter,et al.  Bias modelling in evidence synthesis , 2009, Journal of the Royal Statistical Society. Series A,.

[10]  P. Rothwell,et al.  External validity of randomised controlled trials: “To whom do the results of this trial apply?” , 2005, The Lancet.

[11]  L. Hedges,et al.  Generalizing from unrepresentative experiments: a stratified propensity score approach , 2014 .

[12]  J. Norcross,et al.  Evidence-based practices in mental health : debate and dialogue on the fundamental questions , 2006 .

[13]  C. Frangakis The calibration of treatment effects from clinical trials to target populations , 2009, Clinical trials.

[14]  S. Daniels,et al.  Clinical Tracking of Severely Obese Children: A New Growth Chart , 2012, Pediatrics.

[15]  David M. Murray,et al.  Design and Analysis of Group- Randomized Trials , 1998 .

[16]  F. Gresham,et al.  Behaviorally Effective School Environments. , 2002 .

[17]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[18]  Anne W. Todd,et al.  A Randomized, Wait-List Controlled Effectiveness Trial Assessing School-Wide Positive Behavior Support in Elementary Schools , 2009 .

[19]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[20]  Tx Station Stata Statistical Software: Release 7. , 2001 .

[21]  Catherine P. Bradshaw,et al.  The impact of schoolwide positive behavioral interventions and supports on bullying and peer rejection: a randomized controlled effectiveness trial. , 2012, Archives of pediatrics & adolescent medicine.

[22]  J. Ballenger,et al.  Can Phase III Trial Results of Antidepressant Medications Be Generalized to Clinical Practice? A STAR*D Report , 2011 .

[23]  Catherine P. Bradshaw,et al.  Teacher Observation of Classroom Adaptation—Checklist: Development and Factor Structure , 2009 .

[24]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[25]  John O'Quigley,et al.  Proportional Hazards Regression , 2008 .

[26]  N. Duan,et al.  Generalizability of studies on mental health treatment and outcomes, 1981 to 1996. , 2005, Psychiatric services.

[27]  A. Schirm,et al.  The Impacts of Regular Upward Bound on Postsecondary Outcomes Seven to Nine Years after Scheduled High School Graduation. Final Report. , 2009 .

[28]  Peter Z. Schochet,et al.  Does Job Corps Work? Impact Findings from the National Job Corps Study , 2008 .

[29]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[30]  R. DeRubeis,et al.  Are Research Patients and Clinical Trials Representative of Clinical Practice , 2006 .

[31]  Alex H. S. Harris,et al.  Influence of subject eligibility criteria on compliance with National Institutes of Health guidelines for inclusion of women, minorities, and children in treatment research. , 2007, Alcoholism, clinical and experimental research.

[32]  Mark R. Shinn,et al.  Interventions for academic and behavior problems II : Preventive and remedial approaches , 2002 .

[33]  Eloise E Kaizar,et al.  The use of propensity scores and observational data to estimate randomized controlled trial generalizability bias , 2013, Statistics in medicine.

[34]  Camilla A. Heid,et al.  Head Start Impact Study. Final Report. , 2010 .

[35]  Bradley N Gaynes,et al.  Can phase III trial results of antidepressant medications be generalized to clinical practice? A STAR*D report. , 2009, The American journal of psychiatry.

[36]  Catherine P. Bradshaw,et al.  Effects of School-Wide Positive Behavioral Interventions and Supports on Child Behavior Problems , 2012, Pediatrics.

[37]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[38]  Stephen H Bell,et al.  External Validity in Policy Evaluations that Choose Sites Purposively. , 2013, Journal of policy analysis and management : [the journal of the Association for Public Policy Analysis and Management].

[39]  Julian P T Higgins,et al.  Recent developments in meta‐analysis , 2008, Statistics in medicine.

[40]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[41]  Catherine P. Bradshaw,et al.  Altering School Climate through School-Wide Positive Behavioral Interventions and Supports: Findings from a Group-Randomized Effectiveness Trial , 2009, Prevention Science.

[42]  William R. Shadish,et al.  The logic of generalization: Five principles common to experiments and ethnographies , 1995 .

[43]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[44]  Douglas E Schaubel,et al.  Evaluating bias correction in weighted proportional hazards regression , 2009, Lifetime data analysis.

[45]  Catherine P. Bradshaw,et al.  Examining the validity of office discipline referrals as an indicator of student behavior problems , 2011 .

[46]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[47]  S. Rabe-Hesketh,et al.  Generalized multilevel structural equation modeling , 2004 .

[48]  R. DeRubeis,et al.  Can the randomized controlled trial literature generalize to nonrandomized patients? , 2005, Journal of consulting and clinical psychology.

[49]  David R. Jones,et al.  Hierarchical models in generalized synthesis of evidence: an example based on studies of breast cancer screening. , 2000, Statistics in medicine.

[50]  Elizabeth Tipton Improving Generalizations From Experiments Using Propensity Score Subclassification , 2013 .

[51]  S. Cole,et al.  Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. , 2010, American journal of epidemiology.