An Evaluation of Weighting Methods Based on Propensity Scores to Reduce Selection Bias in Multilevel Observational Studies

Observational studies of multilevel data to estimate treatment effects must consider both the nonrandom treatment assignment mechanism and the clustered structure of the data. We present an approach for implementation of four propensity score (PS) methods with multilevel data involving creation of weights and three types of weight scaling (normalized, cluster-normalized and effective), followed by estimation of multilevel models with the multilevel pseudo-maximum likelihood estimation method. Using a Monte Carlo simulation study, we found that the multilevel model provided unbiased estimates of the Average Treatment Effect on the Treated (ATT) and its standard error across manipulated conditions and combinations of PS model, PS method, and type of weight scaling. Estimates of between-cluster variances of the ATT were biased, but improved as cluster sizes increased. We provide a step-by-step demonstration of how to combine PS methods and multilevel modeling to estimate treatment effects using multilevel data from the Early Childhood Longitudinal Study–Kindergarten Cohort (ECLS-K).

[1]  J. Avorn,et al.  Treatment effects in the presence of unmeasured confounding: dealing with observations in the tails of the propensity score distribution--a simulation study. , 2010, American journal of epidemiology.

[2]  S. Schneeweiss,et al.  Evaluating uses of data mining techniques in propensity score estimation: a simulation study , 2008, Pharmacoepidemiology and drug safety.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  J. Algina,et al.  Generalized eta and omega squared statistics: measures of effect size for some common research designs. , 2003, Psychological methods.

[5]  William M. Holmes,et al.  Using Propensity Scores in Quasi-Experimental Designs , 2013 .

[6]  Felix J Thoemmes,et al.  The Use of Propensity Scores for Nonrandomized Designs With Clustered Data , 2011, Multivariate behavioral research.

[7]  T. Asparouhov General Random Effect Latent Variable Modeling : Random Subjects , Items , Contexts , and Parameters , 2012 .

[8]  F. Thoemmes,et al.  A Systematic Review of Propensity Score Methods in the Social Sciences , 2011, Multivariate behavioral research.

[9]  R. McKelvey,et al.  A statistical model for the analysis of ordinal level dependent variables , 1975 .

[10]  P. Holland Statistics and Causal Inference , 1985 .

[11]  Anthony J. Onwuegbuzie,et al.  Estimating and Using Propensity Score Analysis With Complex Samples , 2006 .

[12]  Terry E. Duncan,et al.  An Extension of the General Latent Variable Growth Modeling Framework to Four Levels of the Hierarchy , 2002 .

[13]  School Improvement Plans and Student Learning in Jamaica , 2010 .

[14]  Fan Li,et al.  Propensity score weighting with multilevel data , 2013, Statistics in medicine.

[15]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[16]  Michael E Griswold,et al.  Propensity Score Adjustment With Multilevel Data: Setting Your Sites on Decreasing Selection Bias , 2010, Annals of Internal Medicine.

[17]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[18]  A. Boomsma,et al.  Robustness Studies in Covariance Structure Modeling , 1998 .

[19]  Bruno Arpino,et al.  The specification of the propensity score in multilevel observational studies , 2011, Comput. Stat. Data Anal..

[20]  J. Schafer,et al.  Average causal effects from nonrandomized studies: a practical guide and simulated example. , 2008, Psychological methods.

[21]  M Soledad Cepeda,et al.  Optimal matching with a variable number of controls vs. a fixed number of controls for a cohort study. trade-offs. , 2003, Journal of clinical epidemiology.

[22]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[23]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[24]  S. Kruger Design Of Observational Studies , 2016 .

[25]  Thomas Lumley,et al.  Complex Surveys: A Guide to Analysis Using R , 2010 .

[26]  Bieke De Fraine,et al.  The Consequence of Ignoring a Level of Nesting in Multilevel Analysis: A Comment , 2005, Multivariate behavioral research.

[27]  William R. Shadish,et al.  Revisiting field experimentation: field notes for the future. , 2002, Psychological methods.

[28]  Harvey Goldstein,et al.  Multiple membership multiple classification (MMMC) models , 2001 .

[29]  Patricia A. Berglund,et al.  Applied Survey Data Analysis , 2010 .

[30]  Harvey Goldstein,et al.  Multilevel Cross-Classified Models , 1994 .

[31]  Richard K. Crump,et al.  Dealing with limited overlap in estimation of average treatment effects , 2009 .

[32]  A. Repetto,et al.  Using school scholarships to estimate the effect of private education on the academic achievement of low-income students in Chile , 2009 .

[33]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[34]  J. Rodgers,et al.  The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy. , 1999, Multivariate behavioral research.

[35]  Guanglei Hong,et al.  Marginal Mean Weighting Through Stratification: Adjustment for Selection Bias in Multilevel Data , 2010 .

[36]  Mirjam Moerbeek,et al.  Randomization of Clusters Versus Randomization of Persons Within Clusters , 2005 .

[37]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[38]  Yu-Sung Su,et al.  What do We Gain? Combining Propensity Score Methods and Multilevel Modeling , 2009 .

[39]  A. Reynolds,et al.  Grade Retention, Postsecondary Education, and Public Aid Receipt , 2010 .

[40]  Peter Z. Schochet Statistical Power for Regression Discontinuity Designs in Education Evaluations , 2009 .

[41]  H. Goldstein,et al.  Weighting for unequal selection probabilities in multilevel models , 1998 .

[42]  R. MacCoun Experimental and Quasi‐Experimental Designs for Generalized Causal Inference, by William R. Shadish, Thomas D. Cook, and Donald T. Campbell. Boston: Houghton Mifflin, 2001, 623 pp., $65.56. , 2003 .

[43]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[44]  Gary King,et al.  Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference , 2007, Political Analysis.

[45]  William R. Doyle Impact of Increased Academic Intensity on Transfer Rates: An Application of Matching Estimators to Student-Unit Record Data , 2009 .

[46]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[47]  Sonya K Sterba,et al.  Alternative Model-Based and Design-Based Frameworks for Inference From Samples to Populations: From Polarization to Integration , 2009, Multivariate behavioral research.

[48]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[49]  Tihomir Asparouhov,et al.  General Multi-Level Modeling with Sampling Weights , 2006 .

[50]  D. Rubin Statistics and Causal Inference: Comment: Which Ifs Have Causal Answers , 1986 .

[51]  Some empirically viable alternatives to random assignment , 2009 .

[52]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[53]  Paul R. Rosenbaum,et al.  Comparison of Multivariate Matching Methods: Structures, Distances, and Algorithms , 1993 .

[54]  D. Rubin Comment: Which Ifs Have Causal Answers , 1986 .

[55]  D. Freedman,et al.  Weighting Regressions by Propensity Scores , 2008, Evaluation review.

[56]  G. Farkas,et al.  A Propensity Score Matching Analysis of the Effects of Special Education Services , 2010, The Journal of special education.

[57]  M. Moerbeek Randomization of Clusters Versus Randomization of Persons Within Clusters , 2005 .

[58]  Ann A. O'Connell,et al.  Multilevel modeling of educational data , 2008 .

[59]  S. Rabe-Hesketh,et al.  Multilevel modelling of complex survey data , 2006 .

[60]  S. Morgan,et al.  Matching Estimators of Causal Effects , 2006 .

[61]  Ben Kelcey Improving and Assessing Propensity Score Based Causal Inferences in Multilevel and Nonlinear Settings. , 2009 .

[62]  H. Bloom,et al.  Designing and Analyzing Studies That Randomize Schools to Estimate Intervention Effects on Student Academic Outcomes Without Classroom-Level Information , 2012 .

[63]  Anees Janee Ali,et al.  Determinants of intercultural adjustment among expatriate spouses , 2003 .

[64]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2973.

[65]  Ben Kelcey Covariate Selection in Propensity Scores Using Outcome Proxies , 2011, Multivariate behavioral research.

[66]  Walter L. Leite,et al.  Modeling Latent Interactions at Level 2 in Multilevel Structural Equation Models: An Evaluation of Mean-Centered and Residual-Centered Unconstrained Approaches , 2011 .

[67]  Erik Weber,et al.  Counterfactuals and causal inference: methods and principles for social research , 2008 .

[68]  Junyeop Kim,et al.  Causal Inference in Multilevel Settings in Which Selection Processes Vary across Schools. CSE Technical Report 708. , 2007 .

[69]  Paul R. Rosenbaum,et al.  Optimal Matching for Observational Studies , 1989 .

[70]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[71]  Peter Z. Schochet Estimators for Clustered Education RCTs Using the Neyman Model for Causal Inference , 2013 .

[72]  W. G. Cochran The effectiveness of adjustment by subclassification in removing bias in observational studies. , 1968, Biometrics.

[73]  Shenyang Guo,et al.  Propensity Score Analysis: Statistical Methods and Applications , 2014 .

[74]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[75]  Jill T. Walston,et al.  Eighth-Grade Algebra: Findings from the Eighth-Grade Round of the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 (ECLS-K). Statistics in Brief. NCES 2010-016. , 2010 .

[76]  B. Muthén,et al.  The multilevel latent covariate model: a new, more reliable approach to group-level effects in contextual studies. , 2008, Psychological methods.

[77]  J. Robins,et al.  Sensitivity Analyses for Unmeasured Confounding Assuming a Marginal Structural Model for Repeated Measures , 2022 .

[78]  M. Berends,et al.  Instructional Conditions in Charter Schools and Students’ Mathematics Achievement Gains , 2010, American Journal of Education.

[79]  B. Muthén,et al.  Multilevel Covariance Structure Analysis , 1994 .

[80]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[81]  D. Rubin Matched Sampling for Causal Effects: The Use of Matched Sampling and Regression Adjustment to Remove Bias in Observational Studies , 1973 .

[82]  Jee-Seon Kim,et al.  Matching Strategies for Observational Multilevel Data , 2012 .

[83]  Laura M. Stapleton,et al.  The Incorporation of Sample Weights Into Multilevel Structural Equation Models , 2002 .

[84]  Guanglei Hong Marginal mean weighting through stratification: a generalized method for evaluating multivalued and multiple treatments with nonexperimental data. , 2012, Psychological methods.

[85]  Greg Ridgeway,et al.  Toolkit for Weighting and Analysis of Nonequivalent Groups , 2014 .

[86]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[87]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .

[88]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[89]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[90]  Kosuke Imai,et al.  Survey Sampling , 1998, Nov/Dec 2017.

[91]  N. Cuong Which covariates should be controlled in propensity score matching? Evidence from a simulation study , 2013 .

[92]  Gary King,et al.  MatchIt: Nonparametric Preprocessing for Parametric Causal Inference , 2011 .

[93]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .

[94]  Jerome P. Reiter,et al.  A comparison of two methods of estimating propensity scores after multiple imputation , 2016, Statistical methods in medical research.

[95]  P. Allison Fixed Effects Regression Models , 2009 .

[96]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[97]  S. Beretvas,et al.  Cross-classified random effects models. , 2008 .

[98]  Kristopher J Preacher,et al.  A general multilevel SEM framework for assessing multilevel mediation. , 2010, Psychological methods.