Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences.

Molecular data form an important research tool in most branches of mycology. A non-trivial proportion of the public fungal DNA sequences are, however, compromised in terms of quality and reliability, contributing noise and bias to sequence-borne inferences such as phylogenetic analysis, diversity assessment, and barcoding. In this paper we discuss various aspects and pitfalls of sequence quality assessment. Based on our observations, we provide a set of guidelines to assist in manual quality management of newly generated, near-full-length (Sanger-derived) fungal ITS sequences and to some extent also sequences of shorter read lengths, other genes or markers, and groups of organisms. The guidelines are intentionally non-technical and do not require substantial bioinformatics skills or significant computational power. Despite their simple nature, we feel they would have caught the vast majority of the severely compromised ITS sequences in the public corpus. Our guidelines are nevertheless not infallible, and common sense and intuition remain important elements in the pursuit of compromised sequence data. The guidelines focus on basic sequence authenticity and reliability of the newly generated sequences, and the user may want to consider additional resources and steps to accomplish the best possible quality control. A discussion on the technical resources for further sequence quality management is therefore provided in the supplementary material.

[1]  R. Henrik Nilsson,et al.  Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences , 2011 .

[2]  Jonathan D. Wren,et al.  URL decay in MEDLINE - a 4-year follow-up study , 2008, Bioinform..

[3]  Kenji Matsuura,et al.  Reconstructing the early evolution of Fungi using a six-gene phylogeny , 2006, Nature.

[4]  D. Bass,et al.  Discovery of novel intermediate forms redefines the fungal tree of life , 2011, Nature.

[5]  Emese Meglécz,et al.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing , 2011, BMC Genomics.

[6]  Kazutaka Katoh,et al.  Parallelization of the MAFFT multiple sequence alignment program , 2010, Bioinform..

[7]  Nils Hallenberg,et al.  Preserving accuracy in GenBank , 2008 .

[8]  C. Quince,et al.  Sample richness and genetic diversity as drivers of chimera formation in nSSU metagenetic analyses , 2012, Nucleic acids research.

[9]  D. Hibbett,et al.  Research Coordination Networks: a phylogeny for kingdom Fungi (Deep Hypha). , 2006 .

[10]  Thomas Huber,et al.  Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. , 2003, International journal of systematic and evolutionary microbiology.

[11]  Susan M. Huse,et al.  Ironing out the wrinkles in the rare biosphere through improved OTU clustering , 2010, Environmental microbiology.

[12]  Mark Blaxter,et al.  Defining operational taxonomic units using DNA barcode data , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[13]  R. Henrik Nilsson,et al.  Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi , 2011, PloS one.

[14]  R. Henrik Nilsson,et al.  A note on the incidence of reverse complementary fungal ITS sequences in the public sequence databases and a software tool for their detection and reorientation , 2011, Mycoscience.

[15]  R. Henrik Nilsson,et al.  Intraspecific ITS Variability in the Kingdom Fungi as Expressed in the International Sequence Databases and Its Implications for Molecular Species Identification , 2008, Evolutionary bioinformatics online.

[16]  J. Cairney,et al.  Diversity and ecology of soil fungal communities: increased understanding through the application of molecular techniques. , 2004, Environmental microbiology.

[17]  G. Wang,et al.  The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species. , 1996, Microbiology.

[18]  Michael Weiss,et al.  A higher-level phylogenetic classification of the Fungi. , 2007, Mycological research.

[19]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[20]  Howard A Ross,et al.  Testing the reliability of genetic methods of species identification via simulation. , 2008, Systematic biology.

[21]  D. Hawksworth The magnitude of fungal diversity: the 1.5 million species estimate revisited * * Paper presented at , 2001 .

[22]  O. Gascuel,et al.  SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. , 2010, Molecular biology and evolution.

[23]  D. Geiser,et al.  The promise and pitfalls of sequence-based identification of plant-pathogenic fungi and oomycetes. , 2010, Phytopathology.

[24]  T. James,et al.  Archaeorhizomycetes: Unearthing an Ancient Class of Ubiquitous Soil Fungi , 2011, Science.

[25]  T. Bruns,et al.  The molecular revolution in ectomycorrhizal ecology: peeking into the black‐box , 2001, Molecular ecology.

[26]  P. Auvinen,et al.  Identifying wood-inhabiting fungi with 454 sequencing – what is the probability that BLAST gives the correct species? , 2010 .

[27]  Jason E. Stajich,et al.  The Fungi , 2009, Current Biology.

[28]  D. Harris,et al.  Can you bank on GenBank , 2003 .

[29]  M. Donoghue,et al.  PHYLOGENETIC DIVERSITY IN SHIITAKE INFERRED FROM NUCLEAR RIBOSOMAL DNA SEQUENCES , 1995 .

[30]  M. Bidartondo,et al.  How to know unknown fungi: the role of a herbarium. , 2009, The New phytologist.

[31]  C. Quince,et al.  V-REVCOMP: automated high-throughput detection of reverse complementary 16S rRNA gene sequences in large environmental and taxonomic datasets. , 2011, FEMS microbiology letters.

[32]  John L. Spouge,et al.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi , 2012, Proceedings of the National Academy of Sciences.

[33]  L. Tedersoo,et al.  454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. , 2010, The New phytologist.

[34]  D. Hibbett,et al.  Phylogenetic species recognition and species concepts in fungi. , 2000, Fungal genetics and biology : FG & B.

[35]  Walter R. Gilks,et al.  Modeling the percolation of annotation errors in a database of protein sequences , 2002, Bioinform..

[36]  Andy F. S. Taylor,et al.  The UNITE database for molecular identification of fungi--recent updates and future perspectives. , 2010, The New phytologist.

[37]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[38]  R. Henrik Nilsson,et al.  Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective , 2006, PloS one.

[39]  Jizhong Zhou,et al.  Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S rRNA Gene-Based Cloning , 2001, Applied and Environmental Microbiology.

[40]  Mark J. Miller,et al.  A Quantitative Comparison of DNA Sequence Assembly Programs , 1994, J. Comput. Biol..

[41]  T. Vrålstad ITS, OTUs and beyond—fungal hyperdiversity calls for supplementary solutions , 2011, Molecular ecology.

[42]  W. Cibula,et al.  Length variation in the internal transcribed spacer of ribosomal DNA in chanterelles , 1994 .

[43]  Wolfgang Maier,et al.  Current state and perspectives of fungal DNA barcoding and rapid identification procedures , 2010, Applied Microbiology and Biotechnology.

[44]  J. Moncalvo,et al.  Fruiting body and soil rDNA sampling detects complementary assemblage of Agaricomycotina (Basidiomycota, Fungi) in a hemlock‐dominated forest plot in southern Ontario , 2008, Molecular ecology.

[45]  R. Henrik Nilsson,et al.  PlutoF—a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences , 2010, Evolutionary Bioinformatics Online.

[46]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[47]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[48]  M. McCormick,et al.  Internal transcribed spacer primers and sequences for improved characterization of basidiomycetous orchid mycorrhizas. , 2008, The New phytologist.

[49]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[50]  Kessy Abarenkov,et al.  Rethinking taxon sampling in the light of environmental sequencing , 2011 .

[51]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.