Integrative variable selection via Bayesian model uncertainty

We are interested in developing integrative approaches for variable selection problems that incorporate external knowledge on a set of predictors of interest. In particular, we have developed an integrative Bayesian model uncertainty (iBMU) method, which formally incorporates multiple sources of data via a second-stage probit model on the probability that any predictor is associated with the outcome of interest. Using simulations, we demonstrate that iBMU leads to an increase in power to detect true marginal associations over more commonly used variable selection techniques, such as least absolute shrinkage and selection operator and elastic net. In addition, iBMU leads to a more efficient model search algorithm over the basic BMU method even when the predictor-level covariates are only modestly informative. The increase in power and efficiency of our method becomes more substantial as the predictor-level covariates become more informative. Finally, we demonstrate the power and flexibility of iBMU for integrating both gene structure and functional biomarker information into a candidate gene study investigating over 50 genes in the brain reward system and their role with smoking cessation from the Pharmacogenetics of Nicotine Addiction and Treatment Consortium.

[1]  D. Conti,et al.  Bayesian Modeling of Complex Metabolic Pathways , 2003, Human Heredity.

[2]  D. Allen,et al.  CHRNA4 and tobacco dependence: from gene regulation to treatment outcome. , 2007, Archives of general psychiatry.

[3]  D. Conti,et al.  Nicotinic acetylcholine receptor β2 subunit gene implicated in a systems-based candidate gene study of smoking cessation , 2008, Human molecular genetics.

[4]  C. Lerman,et al.  Nicotine metabolite ratio predicts efficacy of transdermal nicotine for smoking cessation , 2006, Clinical pharmacology and therapeutics.

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[7]  David V Conti,et al.  Incorporating model uncertainty in detecting rare variants: the Bayesian risk index , 2011, Genetic epidemiology.

[8]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[9]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[10]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[11]  L. Epstein,et al.  Role of Functional Genetic Variation in the Dopamine D2 Receptor (DRD2) in Response to Bupropion and Nicotine Replacement Therapy for Tobacco Dependence: Results of Two Randomized Clinical Trials , 2006, Neuropsychopharmacology.

[12]  Hugh Chipman,et al.  Bayesian variable selection with related predictors , 1995, bayes-an/9510001.

[13]  J. Witte,et al.  Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. , 2003, American journal of human genetics.

[14]  Christina N. Lessov-Schlaggar,et al.  CYP2A6 genotype and the metabolism and disposition kinetics of nicotine , 2006, Clinical pharmacology and therapeutics.

[15]  S Greenland,et al.  Hierarchical regression for epidemiologic analyses of multiple exposures. , 1994, Environmental health perspectives.

[16]  S Greenland,et al.  Principles of multilevel modelling. , 2000, International journal of epidemiology.

[17]  John S Witte,et al.  Using hierarchical modeling in genetic association studies with multiple markers: application to a case-control study of bladder cancer. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[18]  Henning Hermjakob,et al.  The Reactome BioMart , 2011, Database J. Biol. Databases Curation.

[19]  Scott C Schmidler,et al.  BAYESIAN MODEL SEARCH AND MULTILEVEL INFERENCE FOR SNP ASSOCIATION STUDIES. , 2009, The annals of applied statistics.

[20]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[21]  Sander Greenland,et al.  Dissecting Effects of Complex Mixtures: Who’s Afraid of Informative Priors? , 2007, Epidemiology.

[22]  L. Epstein,et al.  Toward Personalized Therapy for Smoking Cessation: A Randomized Placebo‐controlled Trial of Bupropion , 2008, Clinical pharmacology and therapeutics.

[23]  David V Conti,et al.  Discovery of complex pathways from observational data , 2010, Statistics in medicine.

[24]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[25]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[26]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .

[27]  Duncan C Thomas,et al.  The use of hierarchical models for estimating relative risks of individual genetic variants: An application to a study of melanoma , 2008, Statistics in medicine.

[28]  Michael Gill,et al.  Exploration of empirical Bayes hierarchical modeling for the analysis of genome-wide association study data. , 2011, Biostatistics.

[29]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[30]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[31]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.