Learning a mixture of microbial networks using minorization–maximization

Abstract Motivation The interactions among the constituent members of a microbial community play a major role in determining the overall behavior of the community and the abundance levels of its members. These interactions can be modeled using a network whose nodes represent microbial taxa and edges represent pairwise interactions. A microbial network is typically constructed from a sample-taxa count matrix that is obtained by sequencing multiple biological samples and identifying taxa counts. From large-scale microbiome studies, it is evident that microbial community compositions and interactions are impacted by environmental and/or host factors. Thus, it is not unreasonable to expect that a sample-taxa matrix generated as part of a large study involving multiple environmental or clinical parameters can be associated with more than one microbial network. However, to our knowledge, microbial network inference methods proposed thus far assume that the sample-taxa matrix is associated with a single network. Results We present a mixture model framework to address the scenario when the sample-taxa matrix is associated with K microbial networks. This count matrix is modeled using a mixture of K Multivariate Poisson Log-Normal distributions and parameters are estimated using a maximum likelihood framework. Our parameter estimation algorithm is based on the minorization–maximization principle combined with gradient ascent and block updates. Synthetic datasets were generated to assess the performance of our approach on absolute count data, compositional data and normalized data. We also addressed the recovery of sparse networks based on an l1-penalty model. Availability and implementation MixMPLN is implemented in R and is freely available at https://github.com/sahatava/MixMPLN. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Mehdi Layeghifard,et al.  Disentangling Interactions in the Microbiome: A Network Perspective , 2016, Trends in Microbiology.

[2]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[3]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[4]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[5]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[6]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[7]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[8]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[9]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[10]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[11]  K. Lange,et al.  MM Algorithms for Some Discrete Multivariate Distributions , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[12]  Christian L. Müller,et al.  Sparse and Compositionally Robust Inference of Microbial Ecological Networks , 2014, PLoS Comput. Biol..

[13]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[14]  David J. Edwards,et al.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data , 2012, PloS one.

[15]  K. Lange,et al.  The MM Alternative to EM , 2010, 1104.2203.

[16]  Shibu Yooseph,et al.  From bacterial to microbial ecosystems (metagenomics). , 2012, Methods in molecular biology.

[17]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[18]  Vladimir Jojic,et al.  Learning Microbial Interaction Networks from Metagenomic Count Data , 2014, J. Comput. Biol..

[19]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[20]  Denis Thieffry,et al.  Bacterial Molecular Networks , 2012, Methods in Molecular Biology.

[21]  Shibu Yooseph,et al.  Stool microbiota composition is associated with the prospective risk of Plasmodium falciparum infection , 2015, BMC Genomics.

[22]  C. Fuqua,et al.  Bacterial competition: surviving and thriving in the microbial jungle , 2010, Nature Reviews Microbiology.

[23]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[24]  R. Milo,et al.  Revised Estimates for the Number of Human and Bacteria Cells in the Body , 2016, bioRxiv.

[25]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[26]  Hongyu Zhao,et al.  CCLasso: correlation inference for compositional data through Lasso , 2015, Bioinform..

[27]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[28]  C. Quince,et al.  Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics , 2012, PloS one.

[29]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[30]  Kenneth Lange,et al.  MM optimization algorithms , 2016 .

[31]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[32]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[33]  W. Whitman,et al.  Prokaryotes: the unseen majority. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  E. Delong,et al.  The Microbial Engines That Drive Earth's Biogeochemical Cycles , 2008, Science.