The use of propensity scores to assess the generalizability of results from randomized trials

Summary.  Randomized trials remain the most accepted design for estimating the effects of interventions, but they do not necessarily answer a question of primary interest: will the programme be effective in a target population in which it may be implemented? In other words, are the results generalizable? There has been very little statistical research on how to assess the generalizability, or ‘external validity’, of randomized trials. We propose the use of propensity‐score‐based metrics to quantify the similarity of the participants in a randomized trial and a target population. In this setting the propensity score model predicts participation in the randomized trial, given a set of covariates. The resulting propensity scores are used first to quantify the difference between the trial participants and the target population, and then to match, subclassify or weight the control group outcomes to the population, assessing how well the propensity‐score‐adjusted outcomes track the outcomes that are actually observed in the population. These metrics can serve as a first step in assessing the generalizability of results from randomized trials to target populations. The paper lays out these ideas, discusses the assumptions underlying the approach and illustrates the metrics by using data on the evaluation of a schoolwide prevention programme called ‘Positive behavioral interventions and supports’.

[1]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2]  D. Campbell Factors relevant to the validity of experiments in social settings. , 1957, Psychological bulletin.

[3]  D. Rubin Matched Sampling for Causal Effects: The Use of Matched Sampling and Regression Adjustment to Remove Bias in Observational Studies , 1973 .

[4]  W. G. Cochran,et al.  Controlling Bias in Observational Studies: A Review. , 1974 .

[5]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[6]  D. Rubin ASSIGNMENT TO TREATMENT GROUP ON THE BASIS OF A COVARIATE , 1976 .

[7]  D. Rubin,et al.  Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome , 1983 .

[8]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[9]  B. Flay Efficacy and effectiveness trials (and other phases of research) in the development of health promotion programs. , 1986, Preventive medicine.

[10]  P. Rosenbaum A Characterization of Optimal Designs for Observational Studies , 1991 .

[11]  Donald B. Rubin,et al.  Characterizing the effect of matching using linear propensity score methods with normal distributions , 1992 .

[12]  R Peto,et al.  Large-scale randomized evidence: large, simple trials and overviews of trials. , 1993, Annals of the New York Academy of Sciences.

[13]  Russell V. Lenth,et al.  MeWAnalysis by the Confidence Profile Method: The Statistical Synthesis of Evidence , 1993 .

[14]  T. Shakespeare,et al.  Observational Studies , 2003 .

[15]  D. Rubin,et al.  Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates , 2000 .

[16]  David R. Jones,et al.  Hierarchical models in generalized synthesis of evidence: an example based on studies of breast cancer screening. , 2000, Statistics in medicine.

[17]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[18]  J. Manson,et al.  Understanding the divergent data on postmenopausal hormone therapy. , 2003, The New England journal of medicine.