Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error

Nuclear large subunit ribosomal DNA is widely used in fungal phylogenetics and to an increasing extent also amplicon-based environmental sequencing. The relatively short reads produced by next-generation sequencing, however, makes primer choice and sequence error important variables for obtaining accurate taxonomic classifications. In this simulation study we tested the performance of three classification methods: 1) a similarity-based method (BLAST + Metagenomic Analyzer, MEGAN); 2) a composition-based method (Ribosomal Database Project naïve Bayesian classifier, NBC); and, 3) a phylogeny-based method (Statistical Assignment Package, SAP). We also tested the effects of sequence length, primer choice, and sequence error on classification accuracy and perceived community composition. Using a leave-one-out cross validation approach, results for classifications to the genus rank were as follows: BLAST + MEGAN had the lowest error rate and was particularly robust to sequence error; SAP accuracy was highest when long LSU query sequences were classified; and, NBC runs significantly faster than the other tested methods. All methods performed poorly with the shortest 50–100 bp sequences. Increasing simulated sequence error reduced classification accuracy. Community shifts were detected due to sequence error and primer selection even though there was no change in the underlying community composition. Short read datasets from individual primers, as well as pooled datasets, appear to only approximate the true community composition. We hope this work informs investigators of some of the factors that affect the quality and interpretation of their environmental gene surveys.

[1]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[2]  D. Rizzo,et al.  Intra-specific and intra-sporocarp ITS variation of ectomycorrhizal fungi as assessed by rDNA sequencing of sporocarps and pooled ectomycorrhizal roots from a Quercus woodland , 2007, Mycorrhiza.

[3]  Daniel P. Faith,et al.  Compositional dissimilarity as a robust measure of ecological distance , 1987, Vegetatio.

[4]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[5]  M. Abdel-Wahab,et al.  Molecular evidence that deep-branching fungi are major fungal components in deep-sea methane cold-seep sediments. , 2011, Environmental microbiology.

[6]  Thomas D. Bruns,et al.  Fungal Molecular Systematics , 1991 .

[7]  L. Koski,et al.  The Closest BLAST Hit Is Often Not the Nearest Neighbor , 2001, Journal of Molecular Evolution.

[8]  T. Boekhout,et al.  Biodiversity and systematics of basidiomycetous yeasts as determined by large-subunit rDNA D1/D2 domain sequence analysis. , 2000, International journal of systematic and evolutionary microbiology.

[9]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[10]  Dylan Glotzer,et al.  Where are all the undocumented fungal species? A study of Mortierella demonstrates the need for sequence-based classification. , 2011, The New phytologist.

[11]  Robert Samson,et al.  Indoor fungal composition is geographically patterned and more diverse in temperate zones than in the tropics , 2010, Proceedings of the National Academy of Sciences.

[12]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[13]  Rytas Vilgalys,et al.  Diversity and phylogenetic affinities of foliar fungal endophytes in loblolly pine inferred by culturing and environmental PCR , 2007, Mycologia.

[14]  Wouter Boomsma,et al.  Statistical assignment of DNA sequences using Bayesian phylogenetics. , 2008, Systematic biology.

[15]  R. Henrik Nilsson,et al.  Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences , 2011 .

[16]  David L. Hawksworth,et al.  The fungal dimension of biodiversity: magnitude, significance, and conservation , 1991 .

[17]  D. Hibbett,et al.  Phylogenetic evidence for horizontal transmission of group I introns in the nuclear ribosomal DNA of mushroom-forming fungi. , 1996, Molecular biology and evolution.

[18]  J. Bachellerie,et al.  Sequence and secondary structure of mouse 28S rRNA 5'terminal domain. Organisation of the 5.8S-28S rRNA complex. , 1982, Nucleic acids research.

[19]  D. Hibbett,et al.  Assembling the fungal tree of life: progress, classification, and evolution of subcellular traits. , 2004, American journal of botany.

[20]  Michael Weiss,et al.  A higher-level phylogenetic classification of the Fungi. , 2007, Mycological research.

[21]  Christopher W. Schadt,et al.  Seasonal Dynamics of Previously Unknown Fungal Lineages in Tundra Soils , 2003, Science.

[22]  Kenji Matsuura,et al.  Reconstructing the early evolution of Fungi using a six-gene phylogeny , 2006, Nature.

[23]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[24]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[25]  James E. Johnson,et al.  One hundred and seventeen clades of euagarics. , 2002, Molecular phylogenetics and evolution.

[26]  Rob Knight,et al.  The Cladistic Basis for the Phylogenetic Diversity (PD) Measure Links Evolutionary Features to Environmental Gradients and Supports Broad Applications of Microbial Ecology’s “Phylogenetic Beta Diversity” Framework , 2009, International journal of molecular sciences.

[27]  D. Bass,et al.  Discovery of novel intermediate forms redefines the fungal tree of life , 2011, Nature.

[28]  D. Moreira,et al.  The environmental clade LKM11 and Rozella form the deepest branching clade of fungi. , 2010, Protist.

[29]  T. James,et al.  Archaeorhizomycetes: Unearthing an Ancient Class of Ubiquitous Soil Fungi , 2011, Science.

[30]  R. Gutell,et al.  Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23 S-like) ribosomal RNA. , 1996, Journal of molecular biology.

[31]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[32]  R. Knight,et al.  Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers , 2008, Nucleic acids research.

[33]  Pierre Taberlet,et al.  ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases , 2010, BMC Microbiology.

[34]  A. Schüßler,et al.  DNA barcoding of arbuscular mycorrhizal fungi. , 2010, The New phytologist.

[35]  J. Moncalvo,et al.  Systematics of Lyophyllum section Difformia based on evidence from culture studies and ribosomal DNA sequences , 1993 .

[36]  Wouter Boomsma,et al.  Fast phylogenetic DNA barcoding , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[37]  P. Jordano,et al.  Seed Dispersal Anachronisms: Rethinking the Fruits Extinct Megafauna Ate , 2008, PloS one.

[38]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[39]  Daniel H. Huson,et al.  Simultaneous Assessment of Soil Microbial Community Structure and Function through Analysis of the Meta-Transcriptome , 2008, PloS one.

[40]  D. Moreira,et al.  Highly diverse and seasonally dynamic protist community in a pristine peat bog. , 2011, Protist.

[41]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[42]  James Long,et al.  TOPO TA is A-OK: a test of phylogenetic bias in fungal environmental clone library construction. , 2007, Environmental microbiology.

[43]  R. Polikar,et al.  Discovering the Unknown: Improving Detection of Novel Species and Genera from Short Reads , 2011, Journal of biomedicine & biotechnology.

[44]  K. O’Donnell Ribosomal DNA internal transcribed spacers are highly divergent in the phytopathogenic ascomycete Fusarium sambucinum (Gibberella pulicaris) , 1992, Current Genetics.

[45]  M. Bidartondo,et al.  How to know unknown fungi: the role of a herbarium. , 2009, The New phytologist.

[46]  C. Kurtzman,et al.  Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences , 1998, Antonie van Leeuwenhoek.

[47]  A. Schüßler,et al.  DNA-based species level detection of Glomeromycota: one PCR primer set for all arbuscular mycorrhizal fungi. , 2009, The New phytologist.

[48]  C. Schadt,et al.  Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity comparisons due to variable library sizes. , 2012, Environmental microbiology.

[49]  Peter Meinicke,et al.  Mixture models for analysis of the taxonomic composition of metagenomes , 2011, Bioinform..

[50]  D. Bhattacharya,et al.  Heterogeneity of intron presence or absence in rDNA genes of the lichen species Physcia aipolia and P. stellaris , 2005, Current Genetics.

[51]  K. Seifert Progress towards DNA barcoding of fungi , 2009, Molecular ecology resources.

[52]  J. Bachellerie,et al.  The complete nucleotide sequence of mouse 28S rRNA gene. Implications for the process of size increase of the large subunit rRNA in higher eukaryotes. , 1984, Nucleic acids research.

[53]  L. Tedersoo,et al.  454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. , 2010, The New phytologist.

[54]  Wolfgang Maier,et al.  Current state and perspectives of fungal DNA barcoding and rapid identification procedures , 2010, Applied Microbiology and Biotechnology.

[55]  J. Spatafora Assembling The Fungal Tree of Life (AFTOL) , 2005 .

[56]  T. White Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics , 1990 .

[57]  S. Rehner,et al.  Taxonomy and phylogeny of Gliocladium analysed from nuclear large subunit ribosomal DNA sequences , 1994 .

[58]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[59]  J. Moncalvo,et al.  Phylogenetic relationships of agaric fungi based on nuclear large subunit ribosomal DNA sequences. , 2000, Systematic biology.

[60]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[61]  P. Auvinen,et al.  Identifying wood-inhabiting fungi with 454 sequencing – what is the probability that BLAST gives the correct species? , 2010 .

[62]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[63]  Rob Knight,et al.  The 'rare biosphere': a reality check , 2009, Nature Methods.

[64]  D. Hawksworth The magnitude of fungal diversity: the 1.5 million species estimate revisited * * Paper presented at , 2001 .

[65]  N. Högberg,et al.  Inter‐ and intraspecific variation in the ITS region of rDNA of ectomycorrhizal fungi in Fennoscandia as detected by endonuclease analysis , 1997 .

[66]  Vanja Klepac-Ceraj,et al.  PCR-Induced Sequence Artifacts and Bias: Insights from Comparison of Two 16S rRNA Clone Libraries Constructed from the Same Sample , 2005, Applied and Environmental Microbiology.

[67]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[68]  Denis Krompass,et al.  Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood , 2011, Systematic biology.

[69]  J. Bachellerie,et al.  Evolution of large-subunit rRNA structure. The diversification of divergent D3 domain among major phylogenetic groups. , 1990, European journal of biochemistry.

[70]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[71]  R. Vilgalys,et al.  Phylogenetic relationships in the mushroom genus Coprinus and dark-spored allies based on sequence data from the nuclear gene coding for the large ribosomal subunit RNA: divergent domains, outgroups, and monophyly. , 1999, Molecular phylogenetics and evolution.

[72]  Gail L. Rosen,et al.  NBC: the Naïve Bayes Classification tool webserver for taxonomic classification of metagenomic reads , 2010, Bioinform..

[73]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy , 2003, Nucleic Acids Res..

[74]  J. Moncalvo,et al.  Fruiting body and soil rDNA sampling detects complementary assemblage of Agaricomycotina (Basidiomycota, Fungi) in a hemlock‐dominated forest plot in southern Ontario , 2008, Molecular ecology.

[75]  R. Kjøller,et al.  Detection of arbuscular mycorrhizal fungi (Glomales) in roots by nested PCR and SSCP (Single Stranded Conformation Polymorphism) , 2000, Plant and Soil.

[76]  Kuan-Liang Liu,et al.  Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes , 2011, Applied and Environmental Microbiology.

[77]  Sarah C. Goslee,et al.  The ecodist Package for Dissimilarity-based Analysis of Ecological Data , 2007 .

[78]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[79]  G. B. Golding,et al.  Are similarity- or phylogeny-based methods more appropriate for classifying internal transcribed spacer (ITS) metagenomic amplicons? , 2011, The New phytologist.

[80]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[81]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[82]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[83]  D. van Tuinen,et al.  Characterization of root colonization profiles by a microcosm community of arbuscular mycorrhizal fungi using 25S rDNA‐targeted nested PCR , 1998, Molecular ecology.

[84]  Kerstin Voigt,et al.  Where is the unseen fungal diversity hidden? A study of Mortierella reveals a large contribution of reference collections to the identification of fungal environmental sequences. , 2011, The New phytologist.

[85]  R. Vilgalys,et al.  Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several Cryptococcus species , 1990, Journal of bacteriology.

[86]  R. Planta,et al.  The primary and secondary structure of yeast 26S rRNA. , 1981, Nucleic acids research.

[87]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[88]  S. Johansen,et al.  Structural characteristics and possible horizontal transfer of group I introns between closely related plant pathogenic fungi. , 1999, Molecular biology and evolution.

[89]  J. Bachellerie,et al.  Secondary structure of mouse 28S rRNA and general model for the folding of the large rRNA in eukaryotes. , 1984, Nucleic acids research.

[90]  Andy F. S. Taylor,et al.  The UNITE database for molecular identification of fungi--recent updates and future perspectives. , 2010, The New phytologist.

[91]  D. Lindner,et al.  Intragenomic variation in the ITS rDNA region obscures phylogenetic relationships and inflates estimates of operational taxonomic units in genus Laetiporus , 2011, Mycologia.

[92]  Andrew P. Martin,et al.  Widespread occurrence and phylogenetic placement of a soil clone group adds a prominent new branch to the fungal tree of life. , 2008, Molecular phylogenetics and evolution.