Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data

Summary 1. The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and BLAST searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. 2. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 – as well as full-length ITS sequences – from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. 3. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. 4. ITSx paves the way for more sensitive BLAST searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.

[1]  R. Henrik Nilsson,et al.  Intraspecific ITS Variability in the Kingdom Fungi as Expressed in the International Sequence Databases and Its Implications for Molecular Species Identification , 2008, Evolutionary bioinformatics online.

[2]  Ian A Dickie,et al.  Insidious effects of sequencing errors on perceived diversity in molecular surveys. , 2010, The New phytologist.

[3]  R. Henrik Nilsson,et al.  Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences , 2011 .

[4]  P. Kirk,et al.  Recent developments in the taxonomic affiliation and phylogenetic positioning of fungi: impact in applied microbiology and environmental biotechnology , 2011, Applied Microbiology and Biotechnology.

[5]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[6]  Erik Kristiansson,et al.  The ITS region as a target for characterization of fungal communities using emerging sequencing technologies. , 2009, FEMS microbiology letters.

[7]  K. Seifert Progress towards DNA barcoding of fungi , 2009, Molecular ecology resources.

[8]  G. Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[9]  A. von Haeseler,et al.  A Consistent Phylogenetic Backbone for the Fungi , 2011, Molecular biology and evolution.

[10]  M. Hartmann,et al.  Megraft: a software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes and similar environmental datasets. , 2012, Research in microbiology.

[11]  Ursula Eberhardt A constructive step towards selecting a DNA barcode for fungi. , 2010, The New phytologist.

[12]  R. Henrik Nilsson,et al.  A note on the incidence of reverse complementary fungal ITS sequences in the public sequence databases and a software tool for their detection and reorientation , 2011, Mycoscience.

[13]  T. Bruns,et al.  The molecular revolution in ectomycorrhizal ecology: peeking into the black‐box , 2001, Molecular ecology.

[14]  J. Rosselló,et al.  Better the devil you know? Guidelines for insightful utilization of nrDNA ITS in species-level evolutionary studies in plants. , 2007, Molecular phylogenetics and evolution.

[15]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[16]  D. Geiser,et al.  The promise and pitfalls of sequence-based identification of plant-pathogenic fungi and oomycetes. , 2010, Phytopathology.

[17]  D. Hawksworth The magnitude of fungal diversity: the 1.5 million species estimate revisited * * Paper presented at , 2001 .

[18]  M. Donoghue,et al.  PHYLOGENETIC DIVERSITY IN SHIITAKE INFERRED FROM NUCLEAR RIBOSOMAL DNA SEQUENCES , 1995 .

[19]  L. Tedersoo,et al.  454 Pyrosequencing and Sanger sequencing of tropical mycorrhizal fungi provide similar results but reveal substantial methodological biases. , 2010, The New phytologist.

[20]  Wolfgang Maier,et al.  Current state and perspectives of fungal DNA barcoding and rapid identification procedures , 2010, Applied Microbiology and Biotechnology.

[21]  W. Cibula,et al.  Length variation in the internal transcribed spacer of ribosomal DNA in chanterelles , 1994 .

[22]  Thomas Dandekar,et al.  5.8S-28S rRNA interaction and HMM-based ITS2 annotation. , 2009, Gene.

[23]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[24]  John L. Spouge,et al.  Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi , 2012, Proceedings of the National Academy of Sciences.

[25]  ZHU-LIANG Yang,et al.  Molecular techniques revolutionize knowledge of basidiomycete evolution , 2011, Fungal Diversity.

[26]  D. Hillis,et al.  Ribosomal DNA: Molecular Evolution and Phylogenetic Inference , 1991, The Quarterly Review of Biology.

[27]  Alexander Keller,et al.  The ITS2 Database III—sequences and structures for phylogeny , 2009, Nucleic Acids Res..

[28]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[29]  H. Kauserud,et al.  High consistency between replicate 454 pyrosequencing analyses of ectomycorrhizal plant root samples , 2012, Mycorrhiza.

[30]  De‐Zhu Li,et al.  Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants , 2011, Proceedings of the National Academy of Sciences.

[31]  R. D. de Vries,et al.  Post-genomic approaches to understanding interactions between fungi and their environment , 2011, IMA fungus.

[32]  E. Kristiansson,et al.  An open source chimera checker for the fungal ITS region , 2010, Molecular ecology resources.

[33]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[34]  Tor Carlsen,et al.  Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer rDNA region in fungi , 2013, Ecology and evolution.

[35]  M. McCormick,et al.  Internal transcribed spacer primers and sequences for improved characterization of basidiomycetous orchid mycorrhizas. , 2008, The New phytologist.

[36]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[37]  Paul M Kirk,et al.  Fungal ecology catches fire. , 2009, The New phytologist.

[38]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[39]  C. Quince,et al.  V-REVCOMP: automated high-throughput detection of reverse complementary 16S rRNA gene sequences in large environmental and taxonomic datasets. , 2011, FEMS microbiology letters.

[40]  M. Hartmann,et al.  Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets , 2011, Antonie van Leeuwenhoek.

[41]  R. Henrik Nilsson,et al.  An open source software package for automated extraction of ITS1 and ITS2 from fungal ITS sequences for use in high-throughput community assays and molecular ecology , 2010 .

[42]  James S Farris,et al.  The full-length phylogenetic tree from 1551 ribosomal sequences of chitinous fungi, Fungi. , 2003, Mycological research.

[43]  J. Moncalvo,et al.  The cantharelloid clade: dealing with incongruent gene trees and phylogenetic reconstruction methods. , 2006, Mycologia.

[44]  Kessy Abarenkov,et al.  V-Xtractor: an open-source, high-throughput software tool to identify and extract hypervariable regions of small subunit (16S/18S) ribosomal RNA gene sequences. , 2010, Journal of microbiological methods.

[45]  Erik Kristiansson,et al.  Incorporating molecular data in fungal systematics: a guide for aspiring researchers , 2013, 1302.3244.

[46]  Pierre Taberlet,et al.  ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases , 2010, BMC Microbiology.

[47]  Kenji Matsuura,et al.  Reconstructing the early evolution of Fungi using a six-gene phylogeny , 2006, Nature.

[48]  Thomas D. Bruns,et al.  Fungal Community Ecology: A Hybrid Beast with a Molecular Master , 2008 .