Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective

Background DNA sequences are increasingly seen as one of the primary information sources for species identification in many organism groups. Such approaches, popularly known as barcoding, are underpinned by the assumption that the reference databases used for comparison are sufficiently complete and feature correctly and informatively annotated entries. Methodology/Principal Findings The present study uses a large set of fungal DNA sequences from the inclusive International Nucleotide Sequence Database to show that the taxon sampling of fungi is far from complete, that about 20% of the entries may be incorrectly identified to species level, and that the majority of entries lack descriptive and up-to-date annotations. Conclusions The problems with taxonomic reliability and insufficient annotations in public DNA repositories form a tangible obstacle to sequence-based species identification, and it is manifest that the greatest challenges to biological barcoding will be of taxonomical, rather than technical, nature.

[1]  J. C. Dodd,et al.  Glomales rRNA gene diversity – all that glistens is not necessarily glomalean? , 2002, Mycorrhiza.

[2]  Richard P. Shefferson,et al.  Evolutionary studies of ectomycorrhizal fungi: recent advances and future directions , 2004 .

[3]  C. Meyer,et al.  The Controversy , 2022 .

[4]  F. Cohan What are bacterial species? , 2002, Annual review of microbiology.

[5]  T. Bruns,et al.  The molecular revolution in ectomycorrhizal ecology: peeking into the black‐box , 2001, Molecular ecology.

[6]  R Henrik Nilsson,et al.  Automated phylogenetic taxonomy: an example in the homobasidiomycetes (mushroom-forming fungi). , 2005, Systematic biology.

[7]  P. Bridge,et al.  On the unreliability of published DNA sequences. , 2003, The New phytologist.

[8]  A. Meyer,et al.  TaxI: a software tool for DNA barcoding using distance methods , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  Mark Blaxter,et al.  Defining operational taxonomic units using DNA barcode data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  Malte C. Ebach,et al.  DNA barcoding is no substitute for taxonomy , 2005, Nature.

[11]  Thomas J. White,et al.  PCR protocols: a guide to methods and applications. , 1990 .

[12]  R. Henrik Nilsson,et al.  galaxie-CGI scripts for sequence identification through automated phylogenetic analysis , 2004, Bioinform..

[13]  C. Cicero,et al.  Open access, freely available online Correspondence DNA Barcoding: Promise and Pitfalls , 2022 .

[14]  D. Hillis,et al.  Ribosomal DNA: Molecular Evolution and Phylogenetic Inference , 1991, The Quarterly Review of Biology.

[15]  J. Wendel,et al.  Ribosomal ITS sequences and plant phylogenetic inference. , 2003, Molecular phylogenetics and evolution.

[16]  V. Savolainen,et al.  Towards writing the encyclopaedia of life: an introduction to DNA barcoding , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[17]  T. Bruns,et al.  Detection of plot-level changes in ectomycorrhizal communities across years in an old-growth mixed-conifer forest. , 2005, The New phytologist.

[18]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[19]  A. Schüßler,et al.  Glomeromycota rRNA genes—the diversity of myths? , 2003, Mycorrhiza.

[20]  Sujeevan Ratnasingham,et al.  Critical factors for assembling a high volume of DNA barcodes , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  D. Hawksworth The magnitude of fungal diversity: the 1.5 million species estimate revisited * * Paper presented at , 2001 .

[22]  T. White Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics , 1990 .

[23]  Robin Sen,et al.  UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. , 2005, The New phytologist.

[24]  R. Henrik Nilsson,et al.  Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi , 2005, BMC Bioinformatics.