Methods for normalizing microbiome data: An ecological perspective

1. Microbiome sequencing data often need to be normalized due to differences in read depths, and recommendations for microbiome analyses generally warn against using proportions or rarefying to normalize data and instead advocate alternatives, such as upper quartile, CSS, edgeR-TMM, or DESeq-VS. Those recommendations are, however, based on studies that focused on differential abundance testing and variance standardization, rather than community-level comparisons (i.e., beta diversity). Also, standardizing the within-sample variance across samples may suppress differences in species evenness, potentially distorting community-level patterns. Furthermore, the recommended methods use log transformations, which we expect to exaggerate the importance of differences among rare OTUs, while suppressing the importance of differences among common OTUs. 2. We tested these theoretical predictions via simulations and a real-world dataset. 3. Proportions and rarefying produced more accurate comparisons among communities and were the only methods that fully normalized read depths across samples. Additionally, upper quartile, CSS, edgeR-TMM, and DESeq-VS often masked differences among communities when common OTUs differed, and they produced false positives when rare OTUs differed. 4. Based on our simulations, normalizing via proportions may be superior to other commonly used methods for comparing ecological communities.

[1]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[2]  M. Cottrell,et al.  Contribution of major bacterial groups to bacterial biomass production along a salinity gradient in the South China Sea , 2006 .

[3]  C. Pedrós-Alió,et al.  Marine microbial diversity: can it be determined? , 2006, Trends in microbiology.

[4]  Susan P. Holmes,et al.  Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible , 2013, PLoS Comput. Biol..

[5]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[6]  M. Cadotte,et al.  Consequences of dominance: a review of evenness effects on local and regional ecosystem processes. , 2008, Ecology.

[7]  J. Ghazoul Floral diversity and the facilitation of pollination , 2006 .

[8]  Matthew T. Cottrell,et al.  Contribution of major bacterial groups to bacterial biomass production (thymidine and leucine incorporation) in the Delaware estuary , 2003 .

[9]  G. Stirling,et al.  Empirical Relationships between Species Richness, Evenness, and Proportional Diversity , 2001, The American Naturalist.

[10]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[11]  Helmut Hillebrand,et al.  Consumer versus resource control of producer diversity depends on ecosystem type and producer community structure , 2007, Proceedings of the National Academy of Sciences.

[12]  B. Oliver,et al.  Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster , 2016, BMC Genomics.

[13]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[14]  J. Fuhrman General Distributions and the 'rare Biosphere' Microbial Community Structure and Its Functional Implications Review Insight , 2022 .

[15]  Helmut Hillebrand,et al.  Consumer effects decline with prey diversity , 2004 .

[16]  W. Verstraete,et al.  Initial community evenness favours functionality under selective stress , 2009, Nature.

[17]  M. Bouxsein,et al.  β-Arrestin–Biased Parathyroid Hormone Ligands: A New Approach to the Development of Agents that Stimulate Bone Formation , 2009, Science Translational Medicine.

[18]  Peer Bork,et al.  A fair comparison , 2014, Nature Methods.

[19]  M. Willig,et al.  Relationships Among Indices Suggest that Richness Is an Incomplete Surrogate for Grassland Biodiversity , 2005 .

[20]  R. Knight,et al.  The Effect of Diet on the Human Gut Microbiome: A Metagenomic Analysis in Humanized Gnotobiotic Mice , 2009, Science Translational Medicine.

[21]  Donald A. Jackson COMPOSITIONAL DATA IN COMMUNITY ECOLOGY: THE PARADIGM OR PERIL OF PROPORTIONS? , 1997 .

[22]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[23]  B. Wilsey,et al.  Reductions in grassland species evenness increase dicot seedling invasion and spittle bug infestation , 2002 .

[24]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[25]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[26]  Jesse R. Zaneveld,et al.  Normalization and microbial differential abundance strategies depend upon data characteristics , 2017, Microbiome.