Improving Generalizations From Experiments Using Propensity Score Subclassification

As a result of the use of random assignment to treatment, randomized experiments typically have high internal validity. However, units are very rarely randomly selected from a well-defined population of interest into an experiment; this results in low external validity. Under nonrandom sampling, this means that the estimate of the sample average treatment effect calculated in the experiment can be a biased estimate of the population average treatment effect. This article explores the use of the propensity score subclassification estimator as a means for improving generalizations from experiments. It first lays out the assumptions necessary for generalizations, then investigates the amount of bias reduction and average variance inflation that is likely when compared to a conventional estimator. It concludes with a discussion of issues that arise when the population of interest is not well represented by the experiment, and an example.

[1]  J. Tukey,et al.  AVERAGE VALUES OF MEAN SQUARES IN FACTORIALS , 1956 .

[2]  W. G. Cochran The effectiveness of adjustment by subclassification in removing bias in observational studies. , 1968, Biometrics.

[3]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[4]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[5]  L. Cronbach,et al.  Designing evaluations of educational and social programs , 1983 .

[6]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[7]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[8]  P. Holland Statistics and Causal Inference , 1985 .

[9]  R. Little Survey Nonresponse Adjustments for Estimates of Means , 1986 .

[10]  Donald B. Rubin,et al.  Formal modes of statistical inference for causal effects , 1990 .

[11]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[12]  Thomas D. Cook,et al.  A quasi-sampling theory of the generalization of causal relationships , 1993 .

[13]  G. Bohrnstedt,et al.  Class size reduction in California : early evaluation findings, 1996-1998 , 1999 .

[14]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[15]  Catherine P. Bradshaw,et al.  The use of propensity scores to assess the generalizability of results from randomized trials , 2011, Journal of the Royal Statistical Society. Series A,.

[16]  Rajeev Dehejia,et al.  Propensity Score-Matching Methods for Nonexperimental Causal Studies , 2002, Review of Economics and Statistics.

[17]  R. Little SURVEY NONRESPONSE ADJUSTMENTS , 2002 .

[18]  Shawn A. Ross,et al.  Survey Methodology , 2005, The SAGE Encyclopedia of the Sociology of Religion.

[19]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[20]  S. Raudenbush,et al.  Evaluating Kindergarten Retention Policy , 2006 .

[21]  H. Bloom Learning more from social experiments: evolving analytic approaches , 2006 .

[22]  Gary King,et al.  Zelig: Everyone's Statistical Software , 2006 .

[23]  Samuel Kotz,et al.  R Programs for Truncated Distributions , 2006 .

[24]  Barbara Schneider,et al.  Scale-up in education , 2007 .

[25]  Donald B. Rubin,et al.  Comment : Neyman ( 1923 ) and Causal Inference in Experiments and Observational Studies , 2007 .

[26]  L. Hedges Effect Sizes in Cluster-Randomized Designs , 2007 .

[27]  Jennifer Hill,et al.  Discussion of research using propensity‐score matching: Comments on ‘A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine , 2008, Statistics in medicine.

[28]  Vivian C. Wong,et al.  Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within‐study comparisons , 2008 .

[29]  Gary King,et al.  Misunderstandings between experimentalists and observationalists about causal inference , 2008 .

[30]  W. Shadish,et al.  Campbell and Rubin: A primer and comparison of their approaches to causal inference in field settings. , 2010, Psychological methods.

[31]  S. Cole,et al.  Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. , 2010, American journal of epidemiology.

[32]  Jeremy Roschelle,et al.  Integration of Technology, Curriculum, and Professional Development for Advancing Middle School Mathematics , 2010 .

[33]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[34]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[35]  E. Kitagawa,et al.  Standardized comparisons in population research , 1964, Demography.

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Elaine L. Zanutto A Comparison of Propensity Score and Linear Regression Analysis of Complex Survey Data , 2021, Journal of Data Science.