SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis

BackgroundMany tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience.ResultsTo make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots.ConclusionsSPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

[1]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[2]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[3]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[4]  B. Tjaden,et al.  Computational analysis of bacterial RNA-Seq data , 2013, Nucleic acids research.

[5]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[6]  Konrad U. Förstner,et al.  READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data , 2014, bioRxiv.

[7]  Alvis Brazma,et al.  A pipeline for RNA-seq data processing and quality assessment , 2011, Bioinform..

[8]  Fangqing Zhao,et al.  inGAP: an integrated next-generation genome analysis pipeline , 2009, Bioinform..

[9]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[10]  Mattia D'Antonio,et al.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application , 2015, BMC Genomics.

[11]  Saurabh Baheti,et al.  MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing , 2014, BMC Bioinformatics.

[12]  Benjamin K. Johnson,et al.  Slow growth of Mycobacterium tuberculosis at acidic pH is regulated by phoPR and host‐associated carbon sources , 2014, Molecular microbiology.

[13]  Mark Gerstein,et al.  Bioinformatics Applications Note Gene Expression Rseqtools: a Modular Framework to Analyze Rna-seq Data Using Compact, Anonymized Data Summaries , 2022 .

[14]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[15]  Ying Wang,et al.  RseqFlow: workflows for RNA-Seq data analysis , 2011, Bioinform..

[16]  Eduard Kejnovský,et al.  Fully automated pipeline for detection of sex linked genes using RNA-Seq data , 2015, BMC Bioinformatics.

[17]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[18]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[19]  Andrew J. Oler,et al.  Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses , 2014, PeerJ.