The RNA workbench: best practices for RNA and high-throughput sequencing bioinformatics in Galaxy

Abstract RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis. Availability: The RNA workbench is available at https://github.com/bgruening/galaxy-rna-workbench.

[1]  Ivo L. Hofacker,et al.  AREsite2: an enhanced database for the comprehensive investigation of AU/GU/U-rich elements , 2015, Nucleic Acids Res..

[2]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[3]  Harald Barsnes,et al.  BioContainers: an open-source and community-driven framework for software standardization , 2017, Bioinform..

[4]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[5]  Anton Nekrutenko,et al.  Dissemination of scientific software with Galaxy ToolShed , 2014, Genome Biology.

[6]  Vincent Moulton,et al.  The UEA sRNA workbench: a suite of tools for analysing and visualizing next generation sequencing microRNA and small RNA datasets , 2012, Bioinform..

[7]  Eduardo Andrés-León,et al.  miARma-Seq: a comprehensive tool for miRNA, mRNA and circRNA analysis , 2016, Scientific Reports.

[8]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[9]  Ivo L. Hofacker,et al.  AREsite2: an enhanced database for the comprehensive investigation of AU/GU/U-rich elements , 2015, Nucleic Acids Res..

[10]  Florian Eggenhofer,et al.  RNAlien – Unsupervised RNA family model construction , 2016, Nucleic acids research.

[11]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[12]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[13]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[14]  Uwe Ohler,et al.  PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data , 2011, Genome Biology.

[15]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[16]  Nick Goldman,et al.  RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data. , 2011, RNA.

[17]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[18]  Ivo L. Hofacker,et al.  The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures , 2007, Nucleic Acids Res..

[19]  Kai Blin,et al.  DoRiNA 2.0—upgrading the doRiNA database of RNA interactions in post-transcriptional regulation , 2014, Nucleic Acids Res..

[20]  David Haussler,et al.  The UCSC Genome Browser database: 2017 update , 2016, Nucleic Acids Res..

[21]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[22]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[23]  Youri Hoogstrate,et al.  FuMa: reporting overlap in RNA-seq detected fusion genes , 2016, Bioinform..

[24]  John Chilton,et al.  The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update , 2016, Nucleic Acids Res..

[25]  Enis Afgan,et al.  BioBlend: automating pipeline analyses within Galaxy and CloudMan , 2013, Bioinform..

[26]  Mattia D'Antonio,et al.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application , 2015, BMC Genomics.

[27]  Alessandro Vullo,et al.  Ensembl 2017 , 2016, Nucleic Acids Res..

[28]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[29]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[30]  P. Stadler,et al.  LocARNA-P: accurate boundary prediction and improved detection of structural RNAs. , 2012, RNA.