RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets

BackgroundFuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck.ResultsTo overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS – Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets.ConclusionsRIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Matthias Scheuch,et al.  DNase SISPA-Next Generation Sequencing Confirms Schmallenberg Virus in Belgian Field Samples and Identifies Genetic Variation in Europe , 2012, PloS one.

[3]  Paul Wrede,et al.  Simultaneous Identification of DNA and RNA Viruses Present in Pig Faeces Using Process-Controlled Deep Sequencing , 2012, PloS one.

[4]  Roy D Sleator,et al.  Metagenomics and novel gene discovery , 2013, Virulence.

[5]  Gary Benson,et al.  Clinical PathoScope: rapid alignment and filtration for accurate pathogen identification in clinical samples using unassembled sequencing data , 2014, BMC Bioinformatics.

[6]  Mamoon Rashid,et al.  READSCAN: a fast and scalable pathogen discovery program with accurate genome relative abundance estimation , 2012, Bioinform..

[7]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[8]  M. Zaharia,et al.  A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples , 2014, Genome Research.

[9]  Lior Pachter,et al.  Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities , 2005, PLoS Comput. Biol..

[10]  J. Handelsman,et al.  Introducing SONS, a Tool for Operational Taxonomic Unit-Based Comparisons of Microbial Community Memberships and Structures , 2006, Applied and Environmental Microbiology.

[11]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[12]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[13]  Stephen L. Rathbun,et al.  Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples , 2001, Applied and Environmental Microbiology.

[14]  S. Altschul,et al.  Improved Sensitivity of Nucleic Acid Database Searches Using Application-Specific Scoring Matrices , 1991 .

[15]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[16]  J. Handelsman,et al.  Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness , 2005, Applied and Environmental Microbiology.

[17]  F. Bushman,et al.  QIIME allows integration and analysis of high-throughput community sequencing data. Nat. Meth. , 2010 .

[18]  Zhengwei Zhu,et al.  FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes , 2011, Bioinform..

[19]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[20]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[21]  Derrick E. Wood,et al.  Kraken: ultrafast metagenomic sequence classification using exact alignments , 2014, Genome Biology.

[22]  Lin Liu,et al.  Comparison of Next-Generation Sequencing Systems , 2012, Journal of biomedicine & biotechnology.

[23]  Paul Keim,et al.  MetaGeniE: Characterizing Human Clinical Samples Using Deep Metagenomic Sequencing , 2014, PloS one.

[24]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[25]  Kun Qu,et al.  Rapid identification of non-human sequences in high-throughput sequencing datasets , 2012, Bioinform..

[26]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[27]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[28]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[29]  Volker Schmidt,et al.  Avian bornaviruses are widely distributed in canary birds (Serinus canaria f. domestica). , 2013, Veterinary microbiology.

[30]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[31]  Bernd Hoffmann,et al.  Novel Orthobunyavirus in Cattle, Europe, 2011 , 2012, Emerging infectious diseases.

[32]  Tulika Prakash,et al.  Functional assignment of metagenomic data: challenges and applications , 2012, Briefings Bioinform..

[33]  J. Simons,et al.  A new arenavirus in a cluster of fatal transplant-associated diseases. , 2008, The New England journal of medicine.

[34]  Mark Gerstein,et al.  VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment , 2012, Bioinform..

[35]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[36]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[37]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[38]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[39]  BhaduriAparna,et al.  Rapid identification of non-human sequences in high-throughput sequencing datasets , 2012 .

[40]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[41]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[42]  Cheng-Yan Kao,et al.  MetaABC - an integrated metagenomics platform for data adjustment, binning and clustering , 2011, Bioinform..

[43]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[44]  John C. Wooley,et al.  Ultrafast clustering algorithms for metagenomic sequence analysis , 2012, Briefings Bioinform..

[45]  S. Opal,et al.  The current understanding of sepsis and research priorities for the future , 2013, Virulence.

[46]  Daphne Koller,et al.  Genovo: De Novo Assembly for Metagenomes , 2010, RECOMB.

[47]  Matthew Fraser,et al.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data , 2013, Nucleic Acids Res..

[48]  Paolo Fontana,et al.  Bioinformatic approaches for functional annotation and pathway inference in metagenomics data , 2012, Briefings Bioinform..