Multivariate Bayesian variable selection exploiting dependence structure among outcomes: Application to air pollution effects on DNA methylation

The analysis of multiple outcomes is becoming increasingly common in modern biomedical studies. It is well‐known that joint statistical models for multiple outcomes are more flexible and more powerful than fitting a separate model for each outcome; they yield more powerful tests of exposure or treatment effects by taking into account the dependence among outcomes and pooling evidence across outcomes. It is, however, unlikely that all outcomes are related to the same subset of covariates. Therefore, there is interest in identifying exposures or treatments associated with particular outcomes, which we term outcome‐specific variable selection. In this work, we propose a variable selection approach for multivariate normal responses that incorporates not only information on the mean model, but also information on the variance–covariance structure of the outcomes. The approach effectively leverages evidence from all correlated outcomes to estimate the effect of a particular covariate on a given outcome. To implement this strategy, we develop a Bayesian method that builds a multivariate prior for the variable selection indicators based on the variance–covariance of the outcomes. We show via simulation that the proposed variable selection strategy can boost power to detect subtle effects without increasing the probability of false discoveries. We apply the approach to the Normative Aging Study (NAS) epigenetic data and identify a subset of five genes in the asthma pathway for which gene‐specific DNA methylations are associated with exposures to either black carbon, a marker of traffic pollution, or sulfate, a marker of particles generated by power plants.

[1]  George D Thurston,et al.  The role of air pollution in asthma and other pediatric morbidities. , 2005, The Journal of allergy and clinical immunology.

[2]  E. von Mutius The environmental predictors of allergic disease. , 2000, The Journal of allergy and clinical immunology.

[3]  Min Zhang,et al.  Variable selection for large p small n regression models with incomplete data: Mapping QTL with epistases , 2007, BMC Bioinformatics.

[4]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[5]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .

[6]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .

[7]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[8]  Marina Vannucci,et al.  Bayesian Models for Variable Selection that Incorporate Biological Information , 2012 .

[9]  J. Schwartz,et al.  Exposure to airborne particulate matter is associated with methylation pattern in the asthma pathway. , 2013, Epigenomics.

[10]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[11]  J. Hogan,et al.  Bayesian Factor Analysis for Spatially Correlated Data, With Application to Summarizing Area-Level Material Deprivation From Census Data , 2004 .

[12]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[13]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[14]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[15]  N. Narisetty,et al.  Bayesian variable selection with shrinking and diffusing priors , 2014, 1405.6545.

[16]  Albert Damon,et al.  The Normative Aging Study: An Interdisciplinary and Longitudinal Study of Health and Aging , 1972 .

[17]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[18]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[19]  Jean-Michel Marin,et al.  Regularization in regression: comparing Bayesian and frequentist methods in a poorly informative situation , 2010, 1010.0300.