Sparse and Compositionally Robust Inference of Microbial Ecological Networks

16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.

[1]  Se Jin Song,et al.  The treatment-naive microbiome in new-onset Crohn's disease. , 2014, Cell host & microbe.

[2]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[3]  Pradeep Ravikumar,et al.  Learning Graphs with a Few Hubs , 2014, ICML.

[4]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[5]  Simeone Marino,et al.  Mathematical modeling of primary succession of murine intestinal microbiota , 2013, Proceedings of the National Academy of Sciences.

[6]  Hongzhe Li,et al.  VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS. , 2013, The annals of applied statistics.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[10]  D. Caron,et al.  Marine bacterial, archaeal and protistan association networks reveal ecological linkages , 2011, The ISME Journal.

[11]  Richard Bonneau Learning biological networks: from modules to dynamics. , 2008, Nature chemical biology.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[14]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[15]  Po-Ling Loh,et al.  Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , 2012, NIPS.

[16]  James A. Foster,et al.  Application of Ecological Network Theory to the Human Microbiome , 2008, Interdisciplinary perspectives on infectious diseases.

[17]  J. Raes,et al.  Microbial interactions: from networks to models , 2012, Nature Reviews Microbiology.

[18]  John Bunge,et al.  Estimating the Number of Species in Microbial Diversity Studies , 2014 .

[19]  Curtis Huttenhower,et al.  A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets , 2013, PLoS Comput. Biol..

[20]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[21]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[22]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[23]  Michael I. Jordan Graphical Models , 2003 .

[24]  Richard Bonneau,et al.  Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks , 2013, Bioinform..

[25]  J. Aitchison A new approach to null correlations of proportions , 1981 .

[26]  J. Fuhrman,et al.  Community structure of marine bacterioplankton: patterns, networks, and relationships to function , 2008 .

[27]  Hongzhe Li,et al.  Variable selection in regression with compositional covariates , 2014 .

[28]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[29]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[30]  David J. Edwards,et al.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data , 2012, PloS one.

[31]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[32]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[33]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..

[34]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[35]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[36]  Rick L. Stevens,et al.  The Earth Microbiome Project: Meeting report of the “1st EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6th 2010. , 2010, Standards in genomic sciences.

[37]  张静,et al.  Banana Ovate family protein MaOFP1 and MADS-box protein MuMADS1 antagonistically regulated banana fruit ripening , 2015 .

[38]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[39]  W. M. Vos,et al.  Role of the intestinal microbiome in health and disease: from correlation to causation , 2012 .

[40]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[41]  D. Littman,et al.  Microbiota: host interactions in mucosal homeostasis and systemic autoimmunity. , 2013, Cold Spring Harbor symposia on quantitative biology.

[42]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[43]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[44]  R. Knight,et al.  The human microbiome project: exploring the microbial part of ourselves in a changing world , 2022 .

[45]  Feng Luo,et al.  Molecular ecological network analyses , 2012, BMC Bioinformatics.

[46]  W. D. de Vos,et al.  Role of the intestinal microbiome in health and disease : from correlation to causation , 2012 .

[47]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[48]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[49]  C. Huttenhower,et al.  Expansion of intestinal Prevotella copri correlates with enhanced susceptibility to arthritis , 2013, eLife.

[50]  Qiang Liu,et al.  Learning Scale Free Networks by Reweighted L1 regularization , 2011, AISTATS.

[51]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[52]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[53]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[54]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[55]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[56]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[57]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[58]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[59]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[60]  L. Madsen,et al.  Simulating correlated count data , 2007, Environmental and Ecological Statistics.

[61]  H. Berg Cold Spring Harbor Symposia on Quantitative Biology.: Vol. LII. Evolution of Catalytic Functions. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1987, ISBN 0-87969-054-2, xix + 955 pp., US $150.00. , 1989 .

[62]  Richard Bonneau,et al.  Helminth Colonization Is Associated with Increased Diversity of the Gut Microbiota , 2014, PLoS neglected tropical diseases.

[63]  Christian L. Müller,et al.  Don't Fall for Tuning Parameters: Tuning-Free Variable Selection in High Dimensions With the TREX , 2014, AAAI.

[64]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[65]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[66]  Noah Fierer,et al.  Using network analysis to explore co-occurrence patterns in soil microbial communities , 2011, The ISME Journal.