Outcome‐adaptive lasso: Variable selection for causal inference

Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a "throw in the kitchen sink" approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, that is, covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.

[1]  Brian J Reich,et al.  Confounder selection via penalized credible regions , 2014, Biometrics.

[2]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[3]  J. Myers,et al.  Effects of adjusting for instrumental variables on bias and precision of effect estimates. , 2011, American journal of epidemiology.

[4]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[5]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[6]  Giovanni Parmigiani,et al.  Bayesian Effect Estimation Accounting for Adjustment Uncertainty , 2012, Biometrics.

[7]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[8]  Craig A. Rolling,et al.  Model selection for estimating treatment effects , 2014 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  X. Luna,et al.  CovSel: An R Package for Covariate Selection When Estimating Average Causal Effects , 2015 .

[11]  Jenny Häggström,et al.  Targeted smoothing parameter selection for estimating average causal effects , 2013, Computational statistics (Zeitschrift).

[12]  Sander Greenland,et al.  Invited commentary: variable selection versus shrinkage in the control of multiple confounders. , 2007, American journal of epidemiology.

[13]  K. Bucholz,et al.  Prescription Opioid Duration, Dose, and Increased Risk of Depression in 3 Large Patient Populations , 2016, The Annals of Family Medicine.

[14]  Susan M Shortreed,et al.  The Impact of Opioid Risk Reduction Initiatives on High-Dose Opioid Prescribing for Patients on Chronic Opioid Therapy. , 2016, The journal of pain : official journal of the American Pain Society.

[15]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[16]  S. Vansteelandt,et al.  On model selection and model misspecification in causal inference , 2012, Statistical methods in medical research.

[17]  J. Avorn,et al.  Variable selection for propensity score models. , 2006, American journal of epidemiology.

[18]  T. Richardson,et al.  Covariate selection for the nonparametric estimation of an average treatment effect , 2011 .

[19]  Hongzhe Li,et al.  Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics , 2013, Journal of the American Statistical Association.

[20]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[21]  Corwin M Zigler,et al.  Model Feedback in Bayesian Propensity Score Estimation , 2013, Biometrics.

[22]  M Alan Brookhart,et al.  The implications of propensity score variable selection strategies in pharmacoepidemiology: an empirical illustration , 2011, Pharmacoepidemiology and drug safety.

[23]  B. Efron Estimation and Accuracy After Model Selection , 2014, Journal of the American Statistical Association.

[24]  Paul Zador,et al.  Variable selection and raking in propensity scoring. , 2007, Statistics in medicine.

[25]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[26]  Giovanni Parmigiani,et al.  Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models , 2015, Biometrics.

[27]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[28]  K. Freedland,et al.  Prescription Opioid Analgesics Increase the Risk of Depression , 2014, Journal of General Internal Medicine.

[29]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[30]  T. Strine,et al.  The PHQ-8 as a measure of current depression in the general population. , 2009, Journal of affective disorders.

[31]  M. J. van der Laan,et al.  The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2011 .

[32]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[33]  Corwin M Zigler,et al.  Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects , 2014, Journal of the American Statistical Association.

[34]  Talbot Denis,et al.  The Bayesian Causal Effect Estimation Algorithm , 2015 .

[35]  J M Robins,et al.  The role of model selection in causal inference from nonexperimental data. , 1986, American journal of epidemiology.

[36]  S. Shortreed,et al.  Association of levels of opioid use with pain and activity interference among patients initiating chronic opioid therapy: a longitudinal study , 2016, Pain.

[37]  M. Von Korff,et al.  Prescribed opioid difficulties, depression and opioid dose among chronic opioid therapy patients. , 2012, General hospital psychiatry.

[38]  J. Robins,et al.  Estimating causal effects from epidemiological data , 2006, Journal of Epidemiology and Community Health.

[39]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[40]  S. Cole,et al.  Overadjustment Bias and Unnecessary Adjustment in Epidemiologic Studies , 2009, Epidemiology.

[41]  A. Rotnitzky,et al.  A note on overadjustment in inverse probability weighted estimation. , 2010, Biometrika.