High-Dimensional Confounding Adjustment Using Continuous Spike and Slab Priors.

In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders (p) is larger than the number of observations (n), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable Xj is strongly associated with the treatment T but weakly with the outcome Y, the coefficient βj will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients βj corresponding to the potential confounders Xj . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients (βj s) of the Xj s that are strongly associated with T but weakly associated with Y. We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.

[1]  Corwin M Zigler,et al.  Uncertainty in the design stage of two‐stage Bayesian propensity score analysis , 2018, Statistics in medicine.

[2]  E. George,et al.  The Spike-and-Slab LASSO , 2018 .

[3]  Matthew Cefalu,et al.  Doubly robust matching estimators for high dimensional confounding adjustment , 2016, Biometrics.

[4]  Robert Tibshirani,et al.  Post‐selection inference for ℓ1 ‐penalized likelihood models , 2016, The Canadian journal of statistics = Revue canadienne de statistique.

[5]  Ashkan Ertefaie,et al.  Outcome‐adaptive lasso: Variable selection for causal inference , 2017, Biometrics.

[6]  Corwin M Zigler,et al.  Guided Bayesian imputation to adjust for confounding when combining heterogeneous data sources in comparative effectiveness research , 2017, Biostatistics.

[7]  P. Richard Hahn,et al.  Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects , 2017, 1706.09523.

[8]  Giovanni Parmigiani,et al.  Model averaged double robust estimation , 2017, Biometrics.

[9]  Cartik R. Kothari,et al.  A database of human exposomes and phenomes from the US National Health and Nutrition Examination Survey , 2016, Scientific Data.

[10]  G. Imbens,et al.  Approximate residual balancing: debiased inference of average treatment effects in high dimensions , 2016, 1604.07125.

[11]  Ashkan Ertefaie,et al.  Variable Selection in Causal Inference using a Simultaneous Penalization Method , 2015, 1511.08501.

[12]  Talbot Denis,et al.  The Bayesian Causal Effect Estimation Algorithm , 2015 .

[13]  Giovanni Parmigiani,et al.  Accounting for uncertainty in confounder and effect modifier selection when estimating average causal effects in generalized linear models , 2015, Biometrics.

[14]  J. Zubizarreta Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data , 2015 .

[15]  N. Pillai,et al.  Dirichlet–Laplace Priors for Optimal Shrinkage , 2014, Journal of the American Statistical Association.

[16]  M. Farrell Robust Inference on Average Treatment Effects with Possibly More Covariates than Observations , 2013, 1309.4686.

[17]  David Dunson,et al.  Bayesian Factorizations of Big Sparse Tensors , 2013, Journal of the American Statistical Association.

[18]  Brian J Reich,et al.  Confounder selection via penalized credible regions , 2014, Biometrics.

[19]  John P A Ioannidis,et al.  Studying the elusive environment in large scale. , 2014, JAMA.

[20]  Corwin M Zigler,et al.  Uncertainty in Propensity Score Estimation: Bayesian Methods for Variable Selection and Model-Averaged Causal Effects , 2014, Journal of the American Statistical Association.

[21]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[22]  A. Belloni,et al.  Program evaluation and causal inference with high-dimensional data , 2013, 1311.2645.

[23]  Rajeshwari Sundaram,et al.  Exposome: time for transformative research , 2012, Statistics in medicine.

[24]  Giovanni Parmigiani,et al.  Bayesian Effect Estimation Accounting for Adjustment Uncertainty , 2012, Biometrics.

[25]  Atul J Butte,et al.  Systematic evaluation of environmental factors: persistent pollutants and nutrients correlated with serum lipid levels , 2012, International journal of epidemiology.

[26]  S. Vansteelandt,et al.  On model selection and model misspecification in causal inference , 2012, Statistical methods in medical research.

[27]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[28]  T. Richardson,et al.  Covariate selection for the nonparametric estimation of an average treatment effect , 2011 .

[29]  J. Pearl Invited commentary: understanding bias amplification. , 2011, American journal of epidemiology.

[30]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[31]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[32]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[33]  Atul J. Butte,et al.  An Environment-Wide Association Study (EWAS) on Type 2 Diabetes Mellitus , 2010, PloS one.

[34]  M. J. van der Laan,et al.  The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2011 .

[35]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[36]  Ciprian M. Crainiceanu,et al.  Adjustment uncertainty in effect estimation , 2008 .

[37]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[38]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[39]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[40]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[41]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[42]  G. Casella Empirical Bayes Gibbs sampling. , 2001, Biostatistics.

[43]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[44]  R J Little,et al.  Causal effects in clinical and epidemiological studies via potential outcomes: concepts and analytical approaches. , 2000, Annual review of public health.

[45]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[46]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[47]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[48]  Raul Cano On The Bayesian Bootstrap , 1992 .

[49]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .