Taxonomic classification and abundance estimation using 16S and WGS-A comparison using controlled reference samples.

The assessment of microbiome biodiversity is the most common application of metagenomics. While 16S sequencing remains standard procedure for taxonomic profiling of metagenomic data, a growing number of studies have clearly demonstrated biases associated with this method. By using Whole Genome Shotgun sequencing (WGS) metagenomics, most of the known restrictions associated with 16S data are alleviated. However, due to the computationally intensive data analyses and higher sequencing costs, WGS based metagenomics remains a less popular option. Selecting the experiment type that provides a comprehensive, yet manageable amount of information is a challenge encountered in many metagenomics studies. In this work, we created a series of artificial bacterial mixes, each with a different distribution of skin-associated microbial species. These mixes were used to estimate the resolution of two different metagenomic experiments - 16S and WGS - and to evaluate several different bioinformatics approaches for taxonomic read classification. In all test cases, WGS approaches provide much more accurate results, in terms of taxa prediction and abundance estimation, in comparison to those of 16S. Furthermore, we demonstrate that a 16S dataset, analysed using different state of the art techniques and reference databases, can produce widely different results. In light of the fact that most forensic metagenomic analysis are still performed using 16S data, our results are especially important.

[1]  August E. Woerner,et al.  Forensic human identification with targeted microbiome markers using nearest neighbor classification. , 2019, Forensic science international. Genetics.

[2]  S. Abbott,et al.  16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls , 2007, Journal of Clinical Microbiology.

[3]  E. Stewart Growing Unculturable Bacteria , 2012, Journal of bacteriology.

[4]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[5]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[6]  G. Javan,et al.  Microbial communities associated with human decomposition and their potential use as postmortem clocks , 2015, International Journal of Legal Medicine.

[7]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[8]  Jonathan A. Eisen,et al.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance , 2012, PLoS Comput. Biol..

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Jennifer M. Fettweis,et al.  The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies , 2015, BMC Microbiology.

[11]  Ahmed A. Metwally,et al.  Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing. , 2016, Biochemical and biophysical research communications.

[12]  August E. Woerner,et al.  Targeted sequencing of clade-specific markers from skin microbiomes for forensic human identification. , 2018, Forensic science international. Genetics.

[13]  P. Savelkoul,et al.  Automated Broad-Range Molecular Detection of Bacteria in Clinical Samples , 2016, Journal of Clinical Microbiology.

[14]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[15]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Andreas Wilke,et al.  The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools , 2012, BMC Bioinformatics.

[17]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[18]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[19]  Jae-Hyung Ahn,et al.  Effects of PCR cycle number and DNA polymerase type on the 16S rRNA gene pyrosequencing analysis of bacterial communities , 2012, Journal of Microbiology.

[20]  Nicholas A. Bokulich,et al.  A new perspective on microbial landscapes within food production. , 2016, Current opinion in biotechnology.

[21]  Raymond Lo,et al.  Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities , 2015, BMC Bioinformatics.

[22]  Harinder Singh,et al.  Integrating the microbiome as a resource in the forensics toolkit. , 2017, Forensic science international. Genetics.

[23]  Renzo Kottmann,et al.  Analysis of 23S rRNA genes in metagenomes - a case study from the Global Ocean Sampling Expedition. , 2011, Systematic and applied microbiology.

[24]  R. Colwell,et al.  Survival strategies of bacteria in the natural environment. , 1987, Microbiological reviews.

[25]  F. Dewhirst,et al.  Discordant 16S and 23S rRNA Gene Phylogenies for the Genus Helicobacter: Implications for Phylogenetic Inference and Systematics , 2005, Journal of bacteriology.

[26]  Philip D. Blood,et al.  Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software , 2017, Nature Methods.

[27]  David J. Edwards,et al.  Beginner’s guide to comparative bacterial genome analysis using next-generation sequence data , 2013, Microbial Informatics and Experimentation.

[28]  Shusen Zheng,et al.  METAGENOMICS IN STUDYING THE HUMAN GUT MICROBIOME Metagenomics : Revealing the diversity of the human gut microbiome , 2015 .

[29]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[30]  S. Salzberg,et al.  Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[31]  A. Kloosterman,et al.  Human-associated microbial populations as evidence in forensic casework. , 2018, Forensic science international. Genetics.

[32]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[33]  Elizabeth A. Grice,et al.  The skin microbiome , 2020, Nature.

[34]  Aleksandra Tarkowska,et al.  Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments , 2018, GigaScience.

[35]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[36]  Steven L. Salzberg,et al.  Pavian: Interactive analysis of metagenomics data for microbiomics and pathogen identification , 2016, bioRxiv.

[37]  E. Plummer,et al.  A Comparison of Three Bioinformatics Pipelines for the Analysis ofPreterm Gut Microbiota using 16S rRNA Gene Sequencing Data , 2015 .

[38]  M. Daly,et al.  Host genetic variation and its microbiome interactions within the Human Microbiome Project , 2018, Genome Medicine.

[39]  R. Knight,et al.  Forensic identification using skin bacterial communities , 2010, Proceedings of the National Academy of Sciences.

[40]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[41]  Daniel Patrick Smith,et al.  Forensic analysis of the microbiome of phones and shoes , 2015, Microbiome.

[42]  John G Kenny,et al.  A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling , 2016, BMC Genomics.

[43]  Martín P. Vázquez,et al.  Structure, Composition and Metagenomic Profile of Soil Microbiomes Associated to Agricultural Land Use and Tillage Systems in Argentine Pampas , 2014, PloS one.

[44]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[45]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[46]  D. Romberger,et al.  Shotgun Pyrosequencing Metagenomic Analyses of Dusts from Swine Confinement and Grain Facilities , 2014, PloS one.

[47]  Aaron M. Walsh,et al.  Species classifier choice is a key consideration when analysing low-complexity food microbiome data , 2018, Microbiome.

[48]  L. Raskin,et al.  PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets , 2012, PloS one.

[49]  M. Uyttendaele,et al.  Microbial community profiling of fresh basil and pitfalls in taxonomic assignment of enterobacterial pathogenic species based upon 16S rRNA amplicon sequencing. , 2017, International journal of food microbiology.

[50]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[51]  Alejandro Sanchez-Flores,et al.  Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics , 2018, Scientific Reports.

[52]  Masahira Hattori,et al.  The Human Intestinal Microbiome: A New Frontier of Human Biology , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[53]  M. Pop,et al.  CORRESPONDENCE Open Access Correspondence Finishing genomes with limited resources: lessons from an ensemble of microbial genomes , 2022 .

[54]  A. Uitterlinden,et al.  Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA , 1993, Applied and environmental microbiology.

[55]  Li Yang,et al.  Species identification through mitochondrial rRNA genetic analysis , 2014, Scientific Reports.

[56]  Kazuya Watanabe,et al.  Metagenomic insights into the ecology and physiology of microbes in bioelectrochemical systems. , 2018, Bioresource technology.

[57]  Ole Lund,et al.  Rapid Whole-Genome Sequencing for Detection and Characterization of Microorganisms Directly from Clinical Samples , 2013, Journal of Clinical Microbiology.

[58]  G. Olsen,et al.  Ribosomal RNA: a key to phylogeny , 1993, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[59]  Kathrin I. Mohr Diversity of Myxobacteria—We Only See the Tip of the Iceberg , 2018, Microorganisms.

[60]  Yong-guan Zhu,et al.  Functional metagenomic characterization of antibiotic resistance genes in agricultural soils from China. , 2014, Environment international.

[61]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[62]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[63]  Bharat Bhushan,et al.  Metagenomics: Retrospect and Prospects in High Throughput Age , 2015, Biotechnology research international.

[64]  Bernhard Y. Renard,et al.  MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling , 2017, bioRxiv.

[65]  M. Trindade,et al.  Targeted metagenomics as a tool to tap into marine natural product diversity for the discovery and production of drug candidates , 2015, Front. Microbiol..

[66]  Jeroen F. J. Laros,et al.  Determining the quality and complexity of next-generation sequencing data without a reference genome , 2014, Genome Biology.

[67]  Ying Li,et al.  Isolation of a novel alkaline-stable lipase from a metagenomic library and its specific application for milkfat flavor production , 2014, Microbial Cell Factories.

[68]  Haixu Tang,et al.  Comparing Bacterial Communities Inferred from 16s Rrna Gene Sequencing and Shotgun Metagenomics , 2011, Pacific Symposium on Biocomputing.

[69]  J. T. Staley,et al.  Measurement of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. , 1985, Annual review of microbiology.

[70]  Didier Raoult,et al.  16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates , 2000, Journal of Clinical Microbiology.

[71]  F. Thompson,et al.  Metagenomics Sheds Light on the Ecology of Marine Microbes and Their Viruses. , 2018, Trends in microbiology.

[72]  O. Kandler,et al.  Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Ross A. Overbeek,et al.  The ribosomal database project , 1992, Nucleic Acids Res..

[74]  Peter Gill,et al.  Optimizing body fluid recognition from microbial taxonomic profiles. , 2018, Forensic science international. Genetics.

[75]  A. Stams,et al.  Evaluation and optimization of PCR primers for selective and quantitative detection of marine ANME subclusters involved in sulfate-dependent anaerobic methane oxidation , 2017, Applied Microbiology and Biotechnology.

[76]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[77]  G. Wong,et al.  Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics , 2016, Front. Microbiol..

[78]  Paul P. Gardner,et al.  An evaluation of the accuracy and speed of metagenome analysis tools , 2015, Scientific Reports.

[79]  A. Fornaciari Environmental Microbial Forensics and Archaeology of Past Pandemics. , 2017, Microbiology spectrum.

[80]  Stefano Lonardi,et al.  Comprehensive benchmarking and ensemble approaches for metagenomic classifiers , 2017, Genome Biology.

[81]  Titia Sijen,et al.  Molecular approaches for forensic cell type identification: On mRNA, miRNA, DNA methylation and microbial markers. , 2015, Forensic science international. Genetics.

[82]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[83]  J. Bunge,et al.  Polymerase chain reaction primers miss half of rRNA microbial diversity , 2009, The ISME Journal.

[84]  S. Al Khodor,et al.  The Microbiome and Blood Pressure: Can Microbes Regulate Our Blood Pressure? , 2017, Front. Pediatr..

[85]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[86]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[87]  C. Kuske,et al.  Targeted and shotgun metagenomic approaches provide different descriptions of dryland soil microbial communities in a manipulated field study. , 2012, Environmental microbiology reports.

[88]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[89]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[90]  D. Huson,et al.  SILVA, RDP, Greengenes, NCBI and OTT — how do these taxonomies compare? , 2017, BMC Genomics.