An open source chimera checker for the fungal ITS region

The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit holds a central position in the pursuit of the taxonomic affiliation of fungi recovered through environmental sampling. Newly generated fungal ITS sequences are typically compared against the International Nucleotide Sequence Databases for a species or genus name using the sequence similarity software suite blast. Such searches are not without complications however, and one of them is the presence of chimeric entries among the query or reference sequences. Chimeras are artificial sequences, generated unintentionally during the polymerase chain reaction step, that feature sequence data from two (or possibly more) distinct species. Available software solutions for chimera control do not readily target the fungal ITS region, but the present study introduces a blast‐based open source software package (available at http://www.emerencia.org/chimerachecker.html) to examine newly generated fungal ITS sequences for the presence of potentially chimeric elements in batch mode. We used the software package on a random set of 12 300 environmental fungal ITS sequences in the public sequence databases and found 1.5% of the entries to be chimeric at the ordinal level after manual verification of the results. The proportion of chimeras in the sequence databases can be hypothesized to increase as emerging sequencing technologies drawing from pooled DNA samples are becoming important tools in molecular ecology research.

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael Weiss,et al.  A higher-level phylogenetic classification of the Fungi. , 2007, Mycological research.

[3]  N. Lennon,et al.  Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach , 2008, Molecular ecology resources.

[4]  F. Martin,et al.  454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. , 2009, The New phytologist.

[5]  R. Henrik Nilsson,et al.  Intraspecific ITS Variability in the Kingdom Fungi as Expressed in the International Sequence Databases and Its Implications for Molecular Species Identification , 2008, Evolutionary bioinformatics online.

[6]  K. Seifert Progress towards DNA barcoding of fungi , 2009, Molecular ecology resources.

[7]  O. Gascuel,et al.  SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. , 2010, Molecular biology and evolution.

[8]  Erik Kristiansson,et al.  Mining metadata from unidentified ITS sequences in GenBank: A case study in Inocybe (Basidiomycota) , 2008, BMC Evolutionary Biology.

[9]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[10]  J. Moncalvo,et al.  Fruiting body and soil rDNA sampling detects complementary assemblage of Agaricomycotina (Basidiomycota, Fungi) in a hemlock‐dominated forest plot in southern Ontario , 2008, Molecular ecology.

[11]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[12]  Rytas Vilgalys,et al.  Fungal Community Analysis by Large-Scale Sequencing of Environmental Samples , 2005, Applied and Environmental Microbiology.

[13]  A. J. Jones,et al.  At Least 1 in 20 16S rRNA Sequence Records Currently Held in Public Repositories Is Estimated To Contain Substantial Anomalies , 2005, Applied and Environmental Microbiology.

[14]  Erik Kristiansson,et al.  An outlook on the fungal internal transcribed spacer sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. , 2009, The New phytologist.

[15]  Tom Hsiang,et al.  Intergeneric transfer of ribosomal genes between two fungi , 2008, BMC Evolutionary Biology.

[16]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[17]  Nils Hallenberg,et al.  Preserving accuracy in GenBank , 2008 .

[18]  D. Hibbett,et al.  Research Coordination Networks: a phylogeny for kingdom Fungi (Deep Hypha) , 2006, Mycologia.

[19]  R. Henrik Nilsson,et al.  Taxonomic Reliability of DNA Sequences in Public Sequence Databases: A Fungal Perspective , 2006, PloS one.

[20]  Robin Sen,et al.  UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. , 2005, The New phytologist.

[21]  D. Hibbett,et al.  The phylogenetic distribution of resupinate forms across the major clades of mushroom‐forming fungi (Homobasidiomycetes) , 2005 .

[22]  T. Bruns,et al.  The molecular revolution in ectomycorrhizal ecology: peeking into the black‐box , 2001, Molecular ecology.

[23]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[24]  R. Hamelin,et al.  Evaluation of mitochondrial genes as DNA barcode for Basidiomycota , 2009, Molecular ecology resources.

[25]  E. Kristiansson,et al.  A software pipeline for processing and identification of fungal ITS sequences , 2009, Source Code for Biology and Medicine.

[26]  D. Hibbett,et al.  Phylogenetic species recognition and species concepts in fungi. , 2000, Fungal genetics and biology : FG & B.

[27]  G. Wang,et al.  Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes , 1997, Applied and environmental microbiology.

[28]  W. Cibula,et al.  Length variation in the internal transcribed spacer of ribosomal DNA in chanterelles , 1994 .

[29]  D. Hibbett,et al.  After the gold rush, or before the flood? Evolutionary morphology of mushroom-forming fungi (Agaricomycetes) in the early 21st century. , 2007, Mycological research.

[30]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[31]  Thomas Huber,et al.  Bellerophon: a program to detect chimeric sequences in multiple sequence alignments , 2004, Bioinform..

[32]  R. Henrik Nilsson,et al.  Approaching the taxonomic affiliation of unidentified sequences in public databases – an example from the mycorrhizal fungi , 2005, BMC Bioinformatics.

[33]  Taylor Mullineux,et al.  Evolution of rDNA ITS1 and ITS2 sequences and RNA secondary structures within members of the fungal genera Grosmannia and Leptographium. , 2009, Fungal genetics and biology : FG & B.

[34]  Richard Christen,et al.  Global sequencing: a review of current molecular data and new methods available to assess microbial diversity. , 2008, Microbes and environments.

[35]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[36]  Paul M Kirk,et al.  Fungal ecology catches fire. , 2009, The New phytologist.

[37]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..