Bayesian compositional regression with structured priors for microbiome feature selection

The microbiome plays a critical role in human health and disease, and there is a strong scientific interest in linking specific features of the microbiome to clinical outcomes. There are key aspects of microbiome data, however, that limit the applicability of standard variable selection methods. In particular, the observed data are compositional, as the counts within each sample have a fixed-sum constraint. In addition, microbiome features, typically quantified as operational taxonomic units (OTUs), often reflect microorganisms that are similar in function, and may therefore have a similar influence on the response variable. To address the challenges posed by these aspects of the data structure, we propose a variable selection technique with the following novel features: a generalized transformation and z-prior to handle the compositional constraint, and an Ising prior that encourages the joint selection of microbiome features that are closely related in terms of their genetic sequence similarity. We demonstrate that our proposed method outperforms existing penalized approaches for microbiome variable selection in both simulation and the analysis of real data exploring the relationship of the gut microbiome to body mass index (BMI).

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Yan Boucher,et al.  Use of 16S rRNA and rpoB Genes as Molecular Markers for Microbial Ecology Studies , 2006, Applied and Environmental Microbiology.

[3]  Patrice D Cani,et al.  Gut microbiota-mediated inflammation in obesity: a link with gastrointestinal cancer , 2018, Nature Reviews Gastroenterology & Hepatology.

[4]  T. F. Hansen,et al.  Phylogenies and the Comparative Method: A General Approach to Incorporating Phylogenetic Information into the Analysis of Interspecific Data , 1997, The American Naturalist.

[5]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[6]  Hongyu Zhao,et al.  Structured subcomposition selection in regression and its application to microbiome data analysis , 2017 .

[7]  S. Ollier,et al.  Euclidean nature of phylogenetic distance matrices. , 2011, Systematic biology.

[8]  J. Aitchison,et al.  Log contrast models for experiments with mixtures , 1984 .

[9]  Xianyang Zhang,et al.  Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model , 2018, Front. Microbiol..

[10]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[11]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[12]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[13]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[14]  Kenichiro Takahashi,et al.  Comparison of the gut microbial community between obese and lean peoples using 16S gene sequencing in a Japanese population , 2016, Journal of clinical biochemistry and nutrition.

[15]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[16]  Mihai Pop,et al.  A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity , 2016, npj Biofilms and Microbiomes.

[17]  Anru R. Zhang,et al.  Regression Analysis for Microbiome Compositional Data , 2016, 1603.00974.

[18]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[19]  Marina Vannucci,et al.  An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data , 2017, BMC Bioinformatics.

[20]  E. Zoetendal,et al.  Human intestinal microbiota composition is associated with local and systemic inflammation in obesity , 2013, Obesity.

[21]  F. Bushman,et al.  Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes , 2011, Science.

[22]  Rob Knight,et al.  Defining the human microbiome. , 2012, Nutrition reviews.

[23]  Qiyun Zhu,et al.  Methods for phylogenetic analysis of microbiome data , 2018, Nature Microbiology.

[24]  M. C. Jones,et al.  The Statistical Analysis of Compositional Data , 1986 .

[25]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[26]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[27]  F. Bäckhed,et al.  Host-Bacterial Mutualism in the Human Intestine , 2005, Science.

[28]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[29]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .