Chasing Balance and Other Recommendations for Improving Nonparametric Propensity Score Models

Abstract: In this article, we carefully examine two important implementation issues when estimating propensity scores using generalized boosted models (GBM), a promising machine learning technique. First, we examine which of the following methods for tuning GBM lead to better covariate balance and inferences about causal effects: pursuing covariate balance between the treatment groups or tuning the propensity score model on the basis of a model fit criterion. Second, we examine how well GBM can handle irrelevant covariates that are included in the estimation model. We find that chasing balance rather than model fit when estimating propensity scores yielded better covariate balance and more accurate treatment effect estimates. Additionally, we find that adding irrelevant covariates to GBM increased imbalance and bias in the treatment effects. The findings from this paper have useful implications for other work focused on improving methods for estimating propensity scores.

[1]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[2]  S. Dudoit,et al.  Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[3]  Jens Hainmueller,et al.  Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies , 2012, Political Analysis.

[4]  J. Mark,et al.  Targeted estimation of nuisance parameters to obtain valid statistical inference. , 2014 .

[5]  Aad van der Vaart,et al.  The Cross-Validated Adaptive Epsilon-Net Estimator , 2006 .

[6]  B. Graham,et al.  Inverse Probability Tilting for Moment Condition Models with Missing Data , 2008 .

[7]  Daniel Almirall,et al.  The Right Tool for the Job: Choosing Between Covariate-balancing and Generalized Boosted Model Propensity Scores , 2017, Epidemiology.

[8]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[9]  Mark J van der Laan,et al.  Super Learning: An Application to the Prediction of HIV-1 Drug Resistance , 2007, Statistical applications in genetics and molecular biology.

[10]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[11]  Greg Ridgeway,et al.  Toolkit for Weighting and Analysis of Nonequivalent Groups , 2014 .

[12]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[13]  Elizabeth A Stuart,et al.  Improving propensity score weighting using machine learning , 2010, Statistics in medicine.

[14]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[15]  Greg Ridgeway,et al.  Effectiveness of community-based treatment for substance-abusing adolescents: 12-month outcomes of youths entering phoenix academy or alternative probation dispositions. , 2004, Psychology of addictive behaviors : journal of the Society of Psychologists in Addictive Behaviors.

[16]  Sunduz Keles,et al.  Statistical Applications in Genetics and Molecular Biology Supervised Detection of Conserved Motifs in DNA Sequences with Cosmo , 2011 .

[17]  Megan S. Schuler,et al.  Effectiveness of treatment for adolescent substance use: is biological drug testing sufficient? , 2014, Journal of studies on alcohol and drugs.

[18]  Donald B Rubin,et al.  On principles for modeling propensity scores in medical research , 2004, Pharmacoepidemiology and drug safety.

[19]  Til Stürmer,et al.  The role of the c‐statistic in variable selection for propensity score models , 2011, Pharmacoepidemiology and drug safety.

[20]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[21]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[22]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[23]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[24]  G. Ridgeway The State of Boosting ∗ , 1999 .

[25]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[26]  Beth Ann Griffin,et al.  Associations between abstinence in adolescence and economic and educational outcomes seven years later among high-risk youth. , 2011, Drug and alcohol dependence.

[27]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[28]  A. Morral,et al.  Using a cross-study design to assess the efficacy of motivational enhancement therapy-cognitive behavioral therapy 5 (MET/CBT5) in treating adolescents with cannabis-related disorders. , 2011, Journal of studies on alcohol and drugs.

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  M. J. van der Laan Targeted Estimation of Nuisance Parameters to Obtain Valid Statistical Inference , 2014, The international journal of biostatistics.

[31]  D. McCaffrey,et al.  The effectiveness of community-based delivery of an evidence-based treatment for adolescent substance use. , 2012, Journal of substance abuse treatment.

[32]  Til Stürmer,et al.  Variable selection for propensity score models when estimating treatment effects on multiple outcomes: a simulation study , 2013, Pharmacoepidemiology and drug safety.

[33]  Elizabeth A Stuart,et al.  Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. , 2010, Psychological methods.

[34]  M. Dennis,et al.  Development and validation of the GAIN Short Screener (GSS) for internalizing, externalizing and substance use disorders and crime/violence problems among adolescents and adults. , 2006, The American journal on addictions.

[35]  D. McCaffrey,et al.  Propensity score estimation with boosted regression for evaluating causal effects in observational studies. , 2004, Psychological methods.