Bayesian Graphical Compositional Regression for Microbiome Data

Abstract An important task in microbiome studies is to test the existence of and give characterization to differences in the microbiome composition across groups of samples. Important challenges of this problem include the large within-group heterogeneities among samples and the existence of potential confounding variables that, when ignored, increase the chance of false discoveries and reduce the power for identifying true differences. We propose a probabilistic framework to overcome these issues by combining three ideas: (i) a phylogenetic tree-based decomposition of the cross-group comparison problem into a series of local tests, (ii) a graphical model that links the local tests to allow information sharing across taxa, and (iii) a Bayesian testing strategy that incorporates covariates and integrates out the within-group variation, avoiding potentially unstable point estimates. With the proposed method, we analyze the American Gut data to compare the gut microbiome composition of groups of participants with different dietary habits. Our analysis shows that (i) the frequency of consuming fruit, seafood, vegetable, and whole grain are closely related to the gut microbiome composition and (ii) the conclusion of the analysis can change drastically when different sets of relevant covariates are adjusted, indicating the necessity of carefully selecting and including possible confounders in the analysis when comparing microbiome compositions with data from observational studies. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

[1]  Leonhard Held,et al.  Approximate Bayesian Model Selection with the Deviance Statistic , 2013, 1308.6780.

[2]  Marina Vannucci,et al.  An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data , 2017, BMC Bioinformatics.

[3]  S. Y. Dennis On the hyper-Dirichlet type 1 and hyper-Liouville distributions , 1991 .

[4]  C. Holmes,et al.  Two-sample Bayesian Nonparametric Hypothesis Testing , 2009, 0910.5060.

[5]  Hongzhe Li,et al.  VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS. , 2013, The annals of applied statistics.

[6]  A. Kurilshikov,et al.  Environment dominates over host genetics in shaping human gut microbiota , 2018, Nature.

[7]  A. Bull,et al.  Microbial diversity , 2004, Biodiversity & Conservation.

[8]  Rob Knight,et al.  High-fat diet determines the composition of the murine gut microbiome independently of obesity. , 2009, Gastroenterology.

[9]  Brian J. Reich,et al.  MIMIX: A Bayesian Mixed-Effects Model for Microbiome Data From Designed Experiments , 2017, Journal of the American Statistical Association.

[10]  Hongyu Zhao,et al.  A Dirichlet‐tree multinomial regression model for associating dietary nutrients with gut microorganisms , 2017, Biometrics.

[11]  Lawrence A. David,et al.  Diet rapidly and reproducibly alters the human gut microbiome , 2013, Nature.

[12]  Yunfan Tang,et al.  Phylogenetic Dirichlet-multinomial model for microbiome data , 2016 .

[13]  Li Ma,et al.  Probabilistic multi-resolution scanning for two-sample differences , 2017 .

[14]  Kenneth Rice,et al.  FDR and Bayesian Multiple Comparisons Rules , 2006 .

[15]  F. Bushman,et al.  Linking Long-Term Dietary Patterns with Gut Microbial Enterotypes , 2011, Science.

[16]  Michael P. Wellman,et al.  Explaining 'Explaining Away' , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Merlise A. Clyde,et al.  Mixtures of g-Priors in Generalized Linear Models , 2015, Journal of the American Statistical Association.

[18]  Li Ma,et al.  Analysis of distributional variation through multi-scale Beta-Binomial modeling , 2016, 1604.01443.

[19]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[20]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[21]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[22]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[23]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[24]  David J. Edwards,et al.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data , 2012, PloS one.

[25]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[26]  Justine W. Debelius,et al.  Towards large-cohort comparative studies to define the factors influencing the gut microbial community structure of ASD patients , 2015, Microbial ecology in health and disease.

[27]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[28]  W. Liao,et al.  Influence of diet on the gut microbiome and implications for human health , 2017, Journal of Translational Medicine.

[29]  Samuel Y. Dennis,et al.  A Bayesian analysis of tree-structured statistical decision problems , 1996 .

[30]  Li Ma,et al.  Analysis of Distributional Variation Through Graphical Multi-Scale Beta-Binomial Models , 2018, Journal of Computational and Graphical Statistics.

[31]  James O. Berger,et al.  Objective Bayesian Methods for Model Selection: Introduction and Comparison , 2001 .

[32]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[33]  C. Huttenhower,et al.  Bayesian Nonparametric Mixed Effects Models in Microbiome Data Analysis , 2017, 1711.01241.

[34]  Li Ma,et al.  A phylogenetic scan test on a Dirichlet-tree multinomial model for microbiome data , 2016, 1610.08974.

[35]  Hongzhe Li,et al.  A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis , 2013, Biometrics.

[36]  D. Nicolae,et al.  Mixed Effect Dirichlet-Tree Multinomial for Longitudinal Microbiome Data and Weight Prediction , 2017, 1706.06380.

[37]  X. Hua,et al.  Diversity and Composition of the Adult Fecal Microbiome Associated with History of Cesarean Birth or Appendectomy: Analysis of the American Gut Project , 2014, EBioMedicine.

[38]  H. Flint,et al.  Contribution of diet to the composition of the human gut microbiota , 2015, Microbial ecology in health and disease.

[39]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[40]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[41]  E. Johannsen,et al.  Human Gut Microbiome , 2019 .

[42]  Jerome H. Friedman,et al.  A New Graph-Based Two-Sample Test for Multivariate and Object Data , 2013, 1307.6294.

[43]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[44]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[45]  Paul J. McMurdie,et al.  Exact sequence variants should replace operational taxonomic units in marker-gene data analysis , 2017, The ISME Journal.