Understanding and diagnosing the potential for bias when using machine learning methods with doubly robust causal estimators

Data-adaptive methods have been proposed to estimate nuisance parameters when using doubly robust semiparametric methods for estimating marginal causal effects. However, in the presence of near practical positivity violations, these methods can produce a separation of the two exposure groups in terms of propensity score densities which can lead to biased estimates of the treatment effect. To motivate the problem, we evaluated the Targeted Minimum Loss-based Estimation procedure using a simulation scenario to estimate the average treatment effect. We highlight the divergence in estimates obtained when using parametric and data-adaptive methods to estimate the propensity score. We then adapted an existing diagnostic tool based on a bootstrap resampling of the subjects and simulation of the outcome data in order to show that the estimation using data-adaptive methods for the propensity score in this study may lead to large bias and poor coverage. The adapted bootstrap procedure is able to identify this instability and can be used as a diagnostic tool.

[1]  Kristin E. Porter,et al.  Diagnosing and responding to violations in the positivity assumption , 2012, Statistical methods in medical research.

[2]  M. J. Laan,et al.  Doubly robust nonparametric inference on the average treatment effect , 2017, Biometrika.

[3]  J. Myers,et al.  Effects of adjusting for instrumental variables on bias and precision of effect estimates. , 2011, American journal of epidemiology.

[4]  Maya L. Petersen,et al.  Longitudinal Targeted Maximum Likelihood Estimation , 2015 .

[5]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[6]  Mark J van der Laan,et al.  The International Journal of Biostatistics A Targeted Maximum Likelihood Estimator of a Causal Effect on a Bounded Continuous Outcome , 2011 .

[7]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[8]  Stephen R Cole,et al.  The consistency statement in causal inference: a definition or an assumption? , 2009, Epidemiology.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  M. J. van der Laan,et al.  The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2011 .

[11]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[12]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[13]  Jennifer M. Polinski,et al.  Plasmode simulation for the evaluation of pharmacoepidemiologic methods in complex healthcare databases , 2014, Comput. Stat. Data Anal..

[14]  Judea Pearl,et al.  Causal Inference , 2010 .

[15]  Brian K. Lee,et al.  Weight Trimming and Propensity Score Weighting , 2011, PloS one.

[16]  Robert W. Platt,et al.  Targeted Maximum Likelihood Estimation for Pharmacoepidemiologic Research , 2016, Epidemiology.

[17]  Mark J van der Laan,et al.  An Application of Collaborative Targeted Maximum Likelihood Estimation in Causal Inference and Genomics , 2010, The international journal of biostatistics.

[18]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[19]  D. O. Scharfstein Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion) , 1999 .

[20]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[21]  Mark J. van der Laan,et al.  Data-adaptive selection of the truncation level for Inverse-Probability-of-Treatment-Weighted estimators , 2008 .

[22]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[23]  Mark J. van der Laan,et al.  Targeted Maximum Likelihood Estimation: A Gentle Introduction , 2009 .

[24]  M. Beauchesne,et al.  Impact of maternal use of asthma-controller therapy on perinatal outcomes , 2013, Thorax.

[25]  M. Beauchesne,et al.  Development and validation of database indexes of asthma severity and control , 2007, Thorax.

[26]  M. J. van der Laan,et al.  Targeted maximum likelihood estimation in safety analysis. , 2013, Journal of clinical epidemiology.

[27]  Mark J. van der Laan,et al.  Higher-order Targeted Minimum Loss-based Estimation , 2014 .

[28]  Susan Gruber,et al.  Variable Selection for Confounder Control, Flexible Modeling and Collaborative Targeted Minimum Loss-Based Estimation in Causal Inference , 2016, The international journal of biostatistics.

[29]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[30]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[31]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[32]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[33]  Stephen R Cole,et al.  Constructing inverse probability weights for marginal structural models. , 2008, American journal of epidemiology.

[34]  M. Beauchesne,et al.  Use of inhaled corticosteroids during the first trimester of pregnancy and the risk of congenital malformations among women with asthma , 2006, Thorax.