Consistent and correctable bias in metagenomic sequencing experiments

Measurements of biological communities by marker-gene and metagenomic sequencing are biased: The measured relative abundances of taxa or their genes are systematically distorted from their true values because each step in the experimental workflow preferentially detects some taxa over others. Bias can lead to qualitatively incorrect conclusions and makes measurements from different protocols quantitatively incomparable. A rigorous understanding of bias is therefore essential. Here we propose, test, and apply a simple mathematical model of how bias distorts marker-gene and metagenomics measurements: Bias multiplies the true relative abundances within each sample by taxon-and protocol-specific factors that describe the different efficiencies with which taxa are detected by the workflow. Critically, these factors are consistent across samples with different compositions, allowing bias to be estimated and corrected. We validate this model in 16S rRNA gene and shotgun metagenomics data from bacterial communities with defined compositions. We use it to reason about the effects of bias on downstream statistical analyses, finding that analyses based on taxon ratios are less sensitive to bias than analyses based on taxon proportions. Finally, we demonstrate how this model can be used to quantify bias from samples of defined composition, partition bias into steps such as DNA extraction and PCR amplification, and to correct biased measurements. Our model improves on previous models by providing a better fit to experimental data and by providing a composition-independent approach to analyzing, measuring, and correcting bias.

[1]  Pedro Belda-Ferre,et al.  Amplification by PCR Artificially Reduces the Proportion of the Rare Biosphere in Microbial Communities , 2012, PloS one.

[2]  Timothy J. Laurent,et al.  A Taxonomic Signature of Obesity in the Microbiome? Getting to the Guts of the Matter , 2014, PloS one.

[3]  Robert C. Edgar,et al.  Updating the 97% identity threshold for 16S ribosomal RNA OTUs , 2017, bioRxiv.

[4]  V. Pawlowsky-Glahn,et al.  BLU Estimators and Compositional Data , 2002 .

[5]  Martin F. Polz,et al.  Bias in Template-to-Product Ratios in Multitemplate PCR , 1998, Applied and Environmental Microbiology.

[6]  C. Huttenhower,et al.  Experimental design and quantitative analysis of microbial community multiomics , 2017, Genome Biology.

[7]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[8]  Lawrence A. David,et al.  A phylogenetic transform enhances analysis of compositional microbiota data , 2016, bioRxiv.

[9]  Berry J Brosi,et al.  Quantitative and qualitative assessment of pollen DNA metabarcoding using constructed species mixtures , 2018, Molecular ecology.

[10]  Justine W. Debelius,et al.  The Microbiome and Human Biology. , 2017, Annual review of genomics and human genetics.

[11]  Gregory J. Dick,et al.  Genomic Approaches in Earth and Environmental Sciences , 2018 .

[12]  Rita Sipos,et al.  Effect of primer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targetting bacterial community analysis. , 2007, FEMS microbiology ecology.

[13]  Jennifer M. Fettweis,et al.  The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies , 2015, BMC Microbiology.

[14]  J. W. Pendleton,et al.  Surveys of Gene Families Using Polymerase Chain Reaction: PCR Selection and PCR Drift , 1994 .

[15]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.

[16]  J Paul Brooks,et al.  Challenges for case-control studies with microbiome data. , 2016, Annals of epidemiology.

[17]  Davey L. Jones,et al.  Microbes as Engines of Ecosystem Function: When Does Community Structure Enhance Predictions of Ecosystem Processes? , 2016, Front. Microbiol..

[18]  Rob Knight,et al.  Evaluating the Information Content of Shallow Shotgun Metagenomics , 2018, mSystems.

[19]  Jonathan A. Eisen,et al.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance , 2012, PLoS Comput. Biol..

[20]  Luis Pedro Coelho,et al.  Towards standards for human fecal sample processing in metagenomic studies , 2017, Nature Biotechnology.

[21]  K. Pollard,et al.  Toward Accurate and Quantitative Comparative Metagenomics , 2016, Cell.

[22]  A. Gessner,et al.  Multicenter quality assessment of 16S ribosomal DNA-sequencing for microbiome analyses reveals high inter-center variability. , 2016, International journal of medical microbiology : IJMM.

[23]  Hongzhe Li Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis , 2015 .

[24]  J. Aitchison On criteria for measures of compositional difference , 1992 .

[25]  E. Segal,et al.  Personalized Nutrition by Prediction of Glycemic Responses , 2015, Cell.

[26]  K. Schleifer,et al.  Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences , 2014, Nature Reviews Microbiology.

[27]  Amnon Amir,et al.  Preservation Methods Differ in Fecal Microbiome Stability, Affecting Suitability for Field Studies , 2016, mSystems.

[28]  Jean M. Macklaim,et al.  Microbiome Datasets Are Compositional: And This Is Not Optional , 2017, Front. Microbiol..

[29]  B. Deagle,et al.  Quantitative DNA metabarcoding: improved estimates of species proportional biomass using correction factors derived from control material , 2016, Molecular ecology resources.

[30]  Claire Duvallet,et al.  Correcting for batch effects in case-control microbiome studies , 2018, bioRxiv.

[31]  K. Gerald van den Boogaart,et al.  Analyzing Compositional Data with R , 2013 .

[32]  Jun Ying Lim,et al.  Estimating and mitigating amplification bias in qualitative and quantitative arthropod metabarcoding , 2017, Scientific Reports.

[33]  Robert C. Edgar,et al.  UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success , 2017, bioRxiv.

[34]  C. Huttenhower,et al.  Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium , 2017, Nature Biotechnology.

[35]  Jennifer M. Fettweis,et al.  Species-level classification of the vaginal microbiome , 2012, BMC Genomics.

[36]  Anders F. Andersson,et al.  Analysing Microbial Community Composition through Amplicon Sequencing: From Sampling to Hypothesis Testing , 2017, Front. Microbiol..

[37]  Luca Cocolin,et al.  Next generation microbiological risk assessment meta-omics: The next need for integration. , 2017, International journal of food microbiology.

[38]  Duy Tin Truong,et al.  MetaPhlAn2 for enhanced metagenomic taxonomic profiling , 2015, Nature Methods.

[39]  C. Schrader,et al.  PCR inhibitors – occurrence, properties and removal , 2012, Journal of applied microbiology.

[40]  Christine L. Sun,et al.  Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women , 2017, Proceedings of the National Academy of Sciences.

[41]  J. Ravel,et al.  Evaluation of Methods for the Extraction and Purification of DNA from the Human Microbiome , 2012, PloS one.

[42]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[43]  Benjamin J. Callahan,et al.  In Nature, There Is Only Diversity , 2018, mBio.

[44]  Rob Knight,et al.  Microbiome Tools for Forensic Science. , 2017, Trends in biotechnology.

[45]  R. Paredes,et al.  Balances: a New Perspective for Microbiome Analysis , 2017, mSystems.

[46]  J. Eisen,et al.  Metagenomic Sequencing of an In Vitro-Simulated Microbial Community , 2010, PloS one.

[47]  M. Watson,et al.  The Madness of Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies , 2018, Applied and Environmental Microbiology.

[48]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[49]  J. Fuhrman,et al.  Taxon Disappearance from Microbiome Analysis Reinforces the Value of Mock Communities as a Standard in Every Sequencing Run , 2018, mSystems.

[50]  D. Bessesen,et al.  Human gut microbes associated with obesity , 2007 .

[51]  D. Relman,et al.  Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data , 2017, Microbiome.

[52]  Miguel Verdú,et al.  Predicting microbial traits with phylogenies , 2015, The ISME Journal.

[53]  Jeffrey A Muday,et al.  Tools for Metagenomic Analysis at Wastewater Treatment Plants:
Application to a Foaming Episode , 2018, Water environment research : a research publication of the Water Environment Federation.

[54]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[55]  Raimon Tolosana Delgado,et al.  News from compositions, the R package , 2008 .

[56]  R. M. Lehman,et al.  Soil biology for resilient, healthy soil , 2015, Journal of Soil and Water Conservation.

[57]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[58]  S. Giovannoni,et al.  Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR , 1996, Applied and environmental microbiology.

[59]  Rob Knight,et al.  Contamination in Low Microbial Biomass Microbiome Studies: Issues and Recommendations. , 2019, Trends in microbiology.