SAMSA2: a standalone metatranscriptome analysis pipeline

BackgroundComplex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.ResultsSAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.ConclusionsSAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

[1]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[2]  Ian Korf,et al.  SAMSA: a comprehensive metatranscriptome analysis pipeline , 2016, BMC Bioinformatics.

[3]  Siu-Ming Yiu,et al.  IDBA-MTP: A Hybrid MetaTranscriptomic Assembler Based on Protein Information , 2014, RECOMB.

[4]  Haixu Tang,et al.  Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis , 2015, Bioinform..

[5]  M. Kleerebezem,et al.  Functional Profiling of Unfamiliar Microbial Communities Using a Validated De Novo Assembly Metatranscriptome Pipeline , 2016, PloS one.

[6]  Eric P. Nawrocki,et al.  An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea , 2011, The ISME Journal.

[7]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[8]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[9]  John Parkinson,et al.  Comparison of assembly algorithms for improving rate of metatranscriptomic functional annotation , 2014, Microbiome.

[10]  Fernando Azpiroz,et al.  MetaTrans: an open-source pipeline for metatranscriptomics , 2016, Scientific Reports.

[11]  Jos Boekhorst,et al.  A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets , 2013, BMC Genomics.

[12]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[13]  Hélène Touzet,et al.  SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data , 2012, Bioinform..

[14]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[15]  A. Heintz‐Buschart,et al.  IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses , 2016, Genome Biology.

[16]  Tom O. Delmont,et al.  Anvi’o: an advanced analysis and visualization platform for ‘omics data , 2015, PeerJ.

[17]  Siu-Ming Yiu,et al.  IDBA-MT: De Novo Assembler for Metatranscriptomic Data Generated from Next-Generation Sequencing Technology , 2013, J. Comput. Biol..

[18]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[19]  G. Panagiotou,et al.  COMAN: a web server for comprehensive metatranscriptomics analysis , 2016, BMC Genomics.

[20]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[21]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[22]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[23]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[24]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[25]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..