Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes

ABSTRACT Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp).

[1]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Lafay,et al.  Phylogeny of some Fusarium species, as determined by large-subunit rRNA sequence comparison. , 1989, Molecular biology and evolution.

[3]  Philip Hugenholtz,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998, Journal of bacteriology.

[4]  Claude E. Shannon,et al.  A mathematical theory of communication , 1948, MOCO.

[5]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[6]  Rytas Vilgalys,et al.  Fungal Community Analysis by Large-Scale Sequencing of Environmental Samples , 2005, Applied and Environmental Microbiology.

[7]  D. Hibbett,et al.  Research Coordination Networks: a phylogeny for kingdom Fungi (Deep Hypha) , 2006, Mycologia.

[8]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[9]  M. Zobel,et al.  Composition of root‐colonizing arbuscular mycorrhizal fungal communities in different ecosystems around the globe , 2006 .

[10]  Peter M. Letcher,et al.  A molecular phylogeny of the flagellated fungi (Chytridiomycota) and description of a new phylum (Blastocladiomycota). , 2006, Mycologia.

[11]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[12]  Kevin D. Hyde,et al.  Impact of DNA sequence-data on the taxonomy of anamorphic fungi , 2007 .

[13]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[14]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[15]  R. Henrik Nilsson,et al.  Intraspecific ITS Variability in the Kingdom Fungi as Expressed in the International Sequence Databases and Its Implications for Molecular Species Identification , 2008, Evolutionary bioinformatics online.

[16]  F. Martin,et al.  454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. , 2009, The New phytologist.

[17]  Jason E. Stajich,et al.  The Fungi , 2009, Current Biology.

[18]  Jolanta Miadlikowska,et al.  A phylogenetic estimation of trophic transition networks for ascomycetous fungi: are lichens cradles of symbiotrophic fungal diversification? , 2009, Systematic biology.

[19]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[20]  F. Martin,et al.  Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. New Phytol , 2009 .

[21]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[22]  W. D. de Vos,et al.  Comparative Analysis of Pyrosequencing and a Phylogenetic Microarray for Exploring Microbial Community Structures in the Human Distal Intestine , 2009, PloS one.

[23]  K. Jones,et al.  Massively parallel 454 sequencing indicates hyperdiverse fungal communities in temperate Quercus macrocarpa phyllosphere. , 2009, The New phytologist.

[24]  J Davison,et al.  The online database MaarjAM reveals global and ecosystemic distribution patterns in arbuscular mycorrhizal fungi (Glomeromycota). , 2010, The New phytologist.

[25]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[26]  Kendra J. Lipinski,et al.  Diversity and distribution of soil fungal communities in a semiarid grassland , 2011, Mycologia.

[27]  R. Henrik Nilsson,et al.  Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences , 2011 .

[28]  M. Blackwell The fungi: 1, 2, 3 ... 5.1 million species? , 2011, American journal of botany.

[29]  D. Bass,et al.  Discovery of novel intermediate forms redefines the fungal tree of life , 2011, Nature.

[30]  William A. Walters,et al.  Impact of training sets on classification of high-throughput bacterial 16s rRNA gene surveys , 2011, The ISME Journal.