Targeted maximum likelihood estimation for a binary treatment: A tutorial

When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G‐formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double‐robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double‐robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine‐learning methods. It therefore requires weaker assumptions than its competitors. We provide a step‐by‐step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R‐code is provided in easy‐to‐read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial

[1]  Laura Balzer,et al.  Estimating Effects with Rare Outcomes and High Dimensional Covariates: Knowledge is Power , 2016, Epidemiologic methods.

[2]  M. J. van der Laan,et al.  Evaluating the Impact of a HIV Low-Risk Express Care Task-Shifting Program: A Case Study of the Targeted Learning Roadmap , 2016, Epidemiologic methods.

[3]  Dennis D. Boos,et al.  Essential Statistical Inference: Theory and Methods , 2013 .

[4]  Mark J van der Laan,et al.  EFFECT OF BREASTFEEDING ON GASTROINTESTINAL INFECTION IN INFANTS: A TARGETED MAXIMUM LIKELIHOOD APPROACH FOR CLUSTERED LONGITUDINAL DATA. , 2014, The annals of applied statistics.

[5]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[6]  Susan Gruber,et al.  Targeted Learning in Healthcare Research , 2015, Big Data.

[7]  M. J. van der Laan,et al.  The International Journal of Biostatistics Targeted Maximum Likelihood Learning , 2011 .

[8]  Emma Sutton Rejoinder , 2010 .

[9]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[10]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[11]  M. J. van der Laan,et al.  Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2010, The international journal of biostatistics.

[12]  Mark J. van der Laan,et al.  Why prefer double robust estimators in causal inference , 2005 .

[13]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[14]  Dennis D. Boos,et al.  Essential Statistical Inference , 2013 .

[15]  E. Steyerberg,et al.  The changing prevalence of comorbidity across the age spectrum. , 2008, Critical reviews in oncology/hematology.

[16]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[17]  Catherine M Crespi,et al.  Semiparametric Estimation of the Impacts of Longitudinal Interventions on Adolescent Obesity using Targeted Maximum-Likelihood: Accessible Estimation with the ltmle Package , 2014, Journal of causal inference.

[18]  Mark J. van der Laan,et al.  Super learner based conditional density estimation with application to marginal structural models. , 2011 .

[19]  Mark J van der Laan,et al.  The International Journal of Biostatistics An Application of Collaborative Targeted Maximum Likelihood Estimation in Causal Inference and Genomics , 2011 .

[20]  Stephen R Cole,et al.  The Parametric g-Formula for Time-to-event Data: Intuition and a Worked Example , 2014, Epidemiology.

[21]  Robert W. Platt,et al.  Targeted Maximum Likelihood Estimation for Pharmacoepidemiologic Research , 2016, Epidemiology.

[22]  Mark J van der Laan,et al.  The International Journal of Biostatistics Super Learner Based Conditional Density Estimation with Application to Marginal Structural Models , 2012 .

[23]  M. J. van der Laan,et al.  A General Implementation of TMLE for Longitudinal Data Applied to Causal Inference in Survival Analysis , 2012, The international journal of biostatistics.

[24]  Robert Platt Faculty of 1000 evaluation for Targeted maximum likelihood estimation for a binary treatment: A tutorial. , 2018 .

[25]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[26]  C. Gross,et al.  Diagnosis of cancer as an emergency: a critical review of current evidence , 2017, Nature Reviews Clinical Oncology.

[27]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[28]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[29]  Mark J. van der Laan,et al.  Handbook of Big Data , 2016 .

[30]  M. J. van der Laan,et al.  Practice of Epidemiology Improving Propensity Score Estimators ’ Robustness to Model Misspecification Using Super Learner , 2015 .

[31]  Shenyang Guo,et al.  Propensity Score Analysis: Statistical Methods and Applications , 2014 .

[32]  J M Robins,et al.  Identifiability, exchangeability, and epidemiological confounding. , 1986, International journal of epidemiology.

[33]  Charles F. Manski,et al.  Identification for Prediction and Decision , 2008 .

[34]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[35]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[36]  Mark J van der Laan,et al.  Collaborative targeted maximum likelihood estimation for variable importance measure: Illustration for functional outcome prediction in mild traumatic brain injuries , 2018, Statistical methods in medical research.

[37]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[38]  Sherri Rose,et al.  Implementation of G-computation on a simulated data set: demonstration of a causal inference technique. , 2011, American journal of epidemiology.

[39]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[40]  J. Sekhon,et al.  Evaluating treatment effectiveness under model misspecification: A comparison of targeted maximum likelihood estimation with bias-corrected matching , 2014, Statistical methods in medical research.

[41]  Kristin E. Porter,et al.  The Relative Performance of Targeted Maximum Likelihood Estimators , 2011, The international journal of biostatistics.

[42]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[43]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[44]  Sherri Rose,et al.  Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies , 2017, American journal of epidemiology.