Multivariable association discovery in population-scale meta-omics studies

It is challenging to associate features such as human health outcomes, diet, environmental conditions, or other metadata to microbial community measurements, due in part to their quantitative properties. Microbiome multi-omics are typically noisy, sparse (zero-inflated), high-dimensional, extremely non-normal, and often in the form of count or compositional measurements. Here we introduce an optimized combination of novel and established methodology to assess multivariable association of microbial community features with complex metadata in population-scale observational studies. Our approach, MaAsLin 2 (Microbiome Multivariable Associations with Linear Models), uses general linear models to accommodate a wide variety of modern epidemiological studies, including cross-sectional and longitudinal designs, as well as a variety of data types (e.g. counts and relative abundances) with or without covariates and repeated measurements. To construct this method, we conducted a large-scale evaluation of a broad range of scenarios under which straightforward identification of meta-omics associations can be challenging. These simulation studies reveal that MaAsLin 2’s linear model preserves statistical power in the presence of repeated measures and multiple covariates, while accounting for the nuances of meta-omics features and controlling false discovery. We also applied MaAsLin 2 to a microbial multi-omics dataset from the Integrative Human Microbiome (HMP2) project which, in addition to reproducing established results, revealed a unique, integrated landscape of inflammatory bowel disease (IBD) across multiple time points and omics profiles.

[1]  C. Huttenhower,et al.  Statistical approaches for differential expression analysis in metatranscriptomics , 2021, Bioinform..

[2]  S. Chowdhury,et al.  Differential expression of single-cell RNA-seq data using Tweedie models , 2021, bioRxiv.

[3]  Timothy L. Tickle,et al.  A statistical model for describing and simulating microbial community profiles , 2021, bioRxiv.

[4]  C. Huttenhower,et al.  Population Structure Discovery in Meta-Analyzed Microbial Communities and Inflammatory Bowel Disease , 2020, bioRxiv.

[5]  L. Waldron,et al.  Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data , 2020, Genome Biology.

[6]  Jason Brunson,et al.  ggalluvial: Layered Grammar for Alluvial Plots , 2020, J. Open Source Softw..

[7]  J. Rayner,et al.  Sequence count data are poorly fit by the negative binomial distribution , 2020, PloS one.

[8]  Kevin S. Bonham,et al.  Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases , 2019, Nature.

[9]  Courtney R. Armour,et al.  A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome , 2019, mSystems.

[10]  Colin J. Brislawn,et al.  Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases , 2019, Nature.

[11]  Jennifer M. Fettweis,et al.  The Integrative Human Microbiome Project , 2019, Nature.

[12]  Luc Bijnens,et al.  A broken promise: microbiome differential abundance methods do not control the false discovery rate , 2019, Briefings Bioinform..

[13]  Lin Schwarzkopf,et al.  Methods for normalizing microbiome data: An ecological perspective , 2018, Methods in Ecology and Evolution.

[14]  Luke R. Thompson,et al.  Species-level functional profiling of metagenomes and metatranscriptomes , 2018, Nature Methods.

[15]  Travis E. Gibson,et al.  Robust and Scalable Models of Microbiome Dynamics , 2018, ICML.

[16]  Matthew Z. DeMaere,et al.  CAMISIM: simulating metagenomes and microbial communities , 2018, bioRxiv.

[17]  Courtney R. Armour,et al.  A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome , 2018, mSystems.

[18]  Charlotte Soneson,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, Genome Biology.

[19]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[20]  S. Dudoit,et al.  Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications , 2018, bioRxiv.

[21]  Curtis Huttenhower,et al.  bioBakery: a meta’omic analysis environment , 2017, Bioinform..

[22]  Markus Krummenacker,et al.  The MetaCyc database of metabolic pathways and enzymes , 2017, Nucleic acids research.

[23]  Suzanne M. Paley,et al.  The MetaCyc database of metabolic pathways and enzymes , 2017, Nucleic Acids Res..

[24]  C. Huttenhower,et al.  Dynamics of metatranscription in the inflammatory bowel disease gut microbiome , 2018, Nature Microbiology.

[25]  C. Huttenhower,et al.  Experimental design and quantitative analysis of microbial community multiomics , 2017, Genome Biology.

[26]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[27]  Shyamal D. Peddada,et al.  Analysis of Microbiome Data in the Presence of Excess Zeros , 2017, Front. Microbiol..

[28]  Arthur Brady,et al.  Strains, functions and dynamics in the expanded Human Microbiome Project , 2017, Nature.

[29]  Courtney R. Armour,et al.  Development of Inflammatory Bowel Disease Is Linked to a Longitudinal Restructuring of the Gut Metagenome in Mice , 2017, mSystems.

[30]  A. Lusis,et al.  Multi-omics approaches to disease , 2017, Genome Biology.

[31]  Erik Kristiansson,et al.  Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes , 2017, J. Comput. Biol..

[32]  Jesse R. Zaneveld,et al.  Normalization and microbial differential abundance strategies depend upon data characteristics , 2017, Microbiome.

[33]  Lei Zhang,et al.  Negative binomial mixed models for analyzing microbiome count data , 2017, BMC Bioinformatics.

[34]  S. Lynch,et al.  The Human Intestinal Microbiome in Health and Disease. , 2016, The New England journal of medicine.

[35]  N. Yi,et al.  ZERO-INFLATED NEGATIVE BINOMIAL REGRESSION FOR DIFFERENTIAL ABUNDANCE TESTING IN MICROBIOME STUDIES , 2016 .

[36]  S. Sørensen,et al.  Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies , 2016, Microbiome.

[37]  C. Huttenhower,et al.  Interplay of host genetics and gut microbiota underlying the onset and clinical presentation of inflammatory bowel disease , 2016, Gut.

[38]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[39]  Hongzhe Li,et al.  A two-part mixed-effects model for analyzing longitudinal microbiome compositional data , 2016, Bioinform..

[40]  Jilong Li,et al.  What if we ignore the random effects when analyzing RNA-seq data in a multifactor experiment , 2016, Statistical applications in genetics and molecular biology.

[41]  T. VanderWeele Mediation Analysis: A Practitioner's Guide. , 2016, Annual review of public health.

[42]  Gang Li,et al.  Zero-Inflated Beta Regression for Differential Abundance Analysis with Metagenomics Data , 2016, J. Comput. Biol..

[43]  O. Nerman,et al.  Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics , 2016, BMC Genomics.

[44]  Erik Kristiansson,et al.  Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics , 2016, BMC Genomics.

[45]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[46]  Rob Knight,et al.  Analysis of composition of microbiomes: a novel method for studying microbial composition , 2015, Microbial ecology in health and disease.

[47]  C. Huttenhower,et al.  Sequencing and beyond: integrating molecular 'omics' for microbial community profiling , 2015, Nature Reviews Microbiology.

[48]  Timothy L. Tickle,et al.  Associations between host gene expression, the mucosal microbiome, and clinical outcome in the pelvic pouch of patients with inflammatory bowel disease , 2015, Genome Biology.

[49]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[50]  V. Young,et al.  The gut microbiome in health and in disease , 2015, Current opinion in gastroenterology.

[51]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[52]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[53]  Se Jin Song,et al.  The treatment-naive microbiome in new-onset Crohn's disease. , 2014, Cell host & microbe.

[54]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[55]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[56]  Yanwei Zhang,et al.  Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models , 2013, Stat. Comput..

[57]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[58]  Mihai Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[59]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[60]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[61]  Ximeng Zheng FOR RNA-SEQ DATA , 2012 .

[62]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[63]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[64]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[65]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[66]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[67]  A. Zeileis,et al.  Regression Models for Count Data in R , 2008 .

[68]  R. Rigby,et al.  Generalized Additive Models for Location Scale and Shape (GAMLSS) in R , 2007 .

[69]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[70]  B. Ripley,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[71]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[72]  David A. James Modern Applied Statistics With S-PLUS , 1994 .

[73]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[74]  Fitting linear mixed-effects models , 2022 .