Expanding the taxonomic range in the fecal metagenome

Background Except for bacteria, the taxonomic diversity of the human fecal metagenome has not been widely studied, despite the potential importance of viruses and eukaryotes. Widely used bioinformatic tools contain limited numbers of non-bacterial species in their databases compared to available genomic sequences and their methodologies do not favour classification of rare sequences which may represent only a small fraction of their parent genome. In seeking to optimise identification of non-bacterial species, we evaluated five widely-used metagenome classifier programs (BURST, Kraken2, Centrifuge, MetaPhlAn2 and CCMetagen) for their ability to correctly assign and count simulations of bacterial, viral and eukaryotic DNA sequence reads, including the effect of taxonomic order of analysis of bacteria, viruses and eukaryotes and the effect of sequencing depth. Results We found that the precision of metagenome classifiers varied significantly between programs and between taxonomic groups. When classifying viruses and eukaryotes, ordering the analysis such that bacteria were classified first significantly improved classification precision. Increasing sequencing depth decreased classification precision and did not improve recall of rare species. Conclusions Choice of metagenome classifier program can have a marked effect on results with respect to precision of species assignment in different taxonomic groups. The order of taxonomic classification can markedly improve precision. Increasing sequencing depth can decrease classification precision and yields diminishing returns in probability of species detection.

[1]  Ole Lund,et al.  Rapid and precise alignment of raw reads against redundant databases with KMA , 2018, BMC Bioinformatics.

[2]  Steven L. Salzberg,et al.  Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank , 2020, Genome Biology.

[3]  S. Lynch,et al.  The Human Intestinal Microbiome in Health and Disease. , 2016, The New England journal of medicine.

[4]  Günter Mayer,et al.  Systematic evaluation of error rates and causes in short samples in next-generation sequencing , 2018, Scientific Reports.

[5]  Alexandra J. Roth-Schulze,et al.  Changes in the composition and function of the gut microbiome accompany type 1 diabetes in pregnancy , 2020 .

[6]  O. Lund,et al.  CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data , 2020, Genome Biology.

[7]  Vanessa R. Marcelino,et al.  CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data , 2019, Genome Biology.

[8]  Florian P Breitwieser,et al.  A review of methods and databases for metagenomic classification and assembly , 2019, Briefings Bioinform..

[9]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[10]  H. Hallen-Adams,et al.  Fungi in the healthy human gastrointestinal tract , 2017, Virulence.

[11]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[12]  Jennifer Lu,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[13]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[14]  Pardis C. Sabeti,et al.  Benchmarking Metagenomics Tools for Taxonomic Classification , 2019, Cell.

[15]  R. Milo,et al.  Revised Estimates for the Number of Human and Bacteria Cells in the Body , 2016, bioRxiv.

[16]  W. Lipkin,et al.  Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis , 2015, mBio.

[17]  J. Clemente,et al.  The Impact of the Gut Microbiota on Human Health: An Integrative View , 2012, Cell.

[18]  Brian C. Thomas,et al.  Genome-reconstruction for eukaryotes from complex natural microbial communities , 2017, bioRxiv.

[19]  A. Kurilshikov,et al.  Studying the gut virome in the metagenomic era: challenges and perspectives , 2019, BMC Biology.

[20]  R. Gibbs,et al.  The gut mycobiome of the Human Microbiome Project healthy cohort , 2017, Microbiome.

[21]  Steven L. Salzberg,et al.  Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank , 2020, Genome Biology.

[22]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[23]  S. Salzberg,et al.  Centrifuge: rapid and sensitive classification of metagenomic sequences , 2016, bioRxiv.

[24]  Derrick E. Wood,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.