Estimating Microbial Interaction Network:Zero-inflated Latent Ising Model Based Approach

Motivation Throughout their lifespans, humans continually interact with the microbial world, including those organisms which live in and on the human body. Research in this domain has revealed the extensive links between the human-associated microbiota and health. In particular, the microbiota of the human gut plays essential roles in digestion, nutrient metabolism, immune maturation and homeostasis, neurological signaling, and endocrine regulation. Microbial interaction networks are frequently estimated from data and are an indispensable tool for representing and understanding the relationships among the microbes of a microbiota. In this high-dimensional setting, the zero-inflated and compositional data structure (subject to unit-sum constraint) pose challenges to the accurate estimation of microbial interaction networks. Method We propose the zero-inflated latent Ising (ZILI) model for microbial interaction network which assumes that the distribution of relative abundance of microbiota is determined by finite latent states. This assumption is partly supported by the existing findings in literature [20]. The ZILI model can circumvents the unit-sum constraint and alleviates the zero-inflation problem under given assumptions. As for the model selection of ZILI, a two-step algorithm is proposed. ZILI and two-step algorithm are evaluated through simulated data and subsequently applied in our investigation of an infant gut microbiome dataset from New Hampshire Birth Cohort Study. The results are compared with results from traditional Gaussian graphical model (GGM) and dichotomous Ising model (DIS). Results Through the simulation studies, provided that the ZILI model is the true generative model for the data, it is shown that the two-step algorithm can estimate the graphical structure effectively and is robust to a range of alternative settings of the related factors. Both GGM and DIS can not achieve a satisfying performance in these settings. For the infant gut microbiome dataset, we use both ZILI and GGM to estimate microbial interaction network. The final estimated networks turn out to share a statistically significant overlap in which the ZILI and two-step algorithm tend to select the sparser network than those modeled by GGM. From the shared subnetwork, a hub taxon Lachnospiraceae is identified whose involvement in human disease development has been discovered recently in literature. Availability The data and programs involved in Section 4 and 5 are available on request from the correspondence author. Contact Anne.G.Hoen@dartmouth.edu Supplementary information Supplementary materials are available at Bioinformatics

[1]  J. Raes,et al.  Microbial interactions: from networks to models , 2012, Nature Reviews Microbiology.

[2]  Jian-Gao Fan,et al.  Gut microbiota dysbiosis in patients with non-alcoholic fatty liver disease. , 2017, Hepatobiliary & pancreatic diseases international : HBPD INT.

[3]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[4]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[5]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[6]  Zehua Chen,et al.  EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM , 2012 .

[7]  Jie Cheng,et al.  A sparse Ising model with covariates. , 2014, Biometrics.

[8]  D. M. Ward,et al.  16S rRNA sequences reveal numerous uncultured microorganisms in a natural community , 1990, Nature.

[9]  P. Deb Finite Mixture Models , 2008 .

[10]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[11]  Philip A. Romero,et al.  Microbial Interaction Network Inference in Microfluidic Droplets. , 2019, Cell systems.

[12]  Li Chen,et al.  GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data , 2018, PeerJ.

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  Rob Knight,et al.  Defining the human microbiome. , 2012, Nutrition reviews.

[17]  Jose A Navas-Molina,et al.  Balance Trees Reveal Microbial Niche Differentiation , 2017, mSystems.

[18]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[19]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[20]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[21]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[22]  John Walshaw,et al.  Discovery of intramolecular trans-sialidases in human gut microbiota suggests novel mechanisms of mucosal adaptation , 2015, Nature Communications.

[23]  Simeone Marino,et al.  Mathematical modeling of primary succession of murine intestinal microbiota , 2013, Proceedings of the National Academy of Sciences.

[24]  Jun Wang,et al.  Boolean analysis reveals systematic interactions among low-abundance species in the human gut microbiome , 2017, PLoS Comput. Biol..

[25]  Michael A McGuckin,et al.  Mucolytic Bacteria With Increased Prevalence in IBD Mucosa Augment In Vitro Utilization of Mucin by Other Bacteria , 2010, The American Journal of Gastroenterology.

[26]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[27]  Lei Tang More than microbial relative abundances , 2019, Nature Methods.

[28]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[29]  Lei Liu,et al.  Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review , 2019, Statistical Science.

[30]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[31]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[32]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[33]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[34]  Matthew C. B. Tsilimigras,et al.  Compositional data analysis of the microbiome: fundamentals, tools, and challenges. , 2016, Annals of epidemiology.

[35]  Mashe Sniedovich,et al.  Dynamic Programming , 1991 .

[36]  Jun Sun,et al.  Modeling Zero-Inflated Microbiome Data , 2018 .

[37]  Stefanie Widder,et al.  Deciphering microbial interactions and detecting keystone species with co-occurrence networks , 2014, Front. Microbiol..

[38]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[39]  Noah Fierer,et al.  Using network analysis to explore co-occurrence patterns in soil microbial communities , 2011, The ISME Journal.

[40]  Vladimir Jojic,et al.  Learning Microbial Interaction Networks from Metagenomic Count Data , 2014, J. Comput. Biol..

[41]  P. J. Hughesdon,et al.  The Struggle for Existence , 1927, Nature.

[42]  Jürg Bähler,et al.  Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.

[43]  Teeratorn Kadeethum,et al.  Physics-informed neural networks for solving nonlinear diffusivity and Biot’s equations , 2020, PloS one.

[44]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[45]  Michael A. Fischbach,et al.  A biosynthetic pathway for a prominent class of microbiota-derived bile acids , 2015, Nature chemical biology.

[46]  Russell V. Lenth,et al.  Response-Surface Methods in R, Using rsm , 2009 .

[47]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[48]  Valeria Sagheddu,et al.  Infant Early Gut Colonization by Lachnospiraceae: High Frequency of Ruminococcus gnavus , 2016, Front. Pediatr..

[49]  Xing Qiu,et al.  High-dimensional linear state space models for dynamic microbial interaction networks , 2017, PloS one.

[50]  D. Sinderen,et al.  Gut microbiota composition correlates with diet and health in the elderly , 2012, Nature.

[51]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[52]  M. Sniedovich Dynamic programming : foundations and principles , 2011 .

[53]  Zaid Abdo,et al.  Temporal Dynamics of the Human Vaginal Microbiota , 2012, Science Translational Medicine.