Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan. Contact: vipin@cbio.mskcc.org, ratschg@mskcc.org Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[2]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[3]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[4]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[5]  Gunnar Rätsch,et al.  ARTS: accurate recognition of transcription starts in human , 2006, ISMB.

[6]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[7]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[8]  Gunnar Rätsch,et al.  rQuant.web: a tool for RNA-Seq-based transcript quantitation , 2010, Nucleic Acids Res..

[9]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[10]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[11]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[12]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[13]  Gunnar Rätsch,et al.  Statistical Tests for Detecting Differential RNA-Transcript Expression from Read Counts , 2010, ISMB 2011.

[14]  Gunnar Rätsch,et al.  Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation , 2011, NIPS.

[15]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[16]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[17]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[18]  Gunnar Rätsch,et al.  mGene.web: a web service for accurate computational gene finding , 2009, Nucleic Acids Res..

[19]  Vipin T. Sreedharan,et al.  RNA‐Seq Read Alignments with PALMapper , 2010, Current protocols in bioinformatics.

[20]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[21]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[22]  Vipin T. Sreedharan,et al.  Multiple reference genomes and transcriptomes for Arabidopsis thaliana , 2011, Nature.

[23]  Sebastian J. Schultheiß,et al.  Ten Simple Rules for Providing a Scientific Web Resource , 2011, PLoS Comput. Biol..

[24]  J. Rinn,et al.  Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs , 2010, Nature Biotechnology.

[25]  Gunnar Rätsch,et al.  KIRMES: kernel-based identification of regulatory modules in euchromatic sequences , 2009, BMC Bioinformatics.

[26]  References , 1971 .

[27]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[28]  R. Gibbs,et al.  The Drosophila melanogaster transcriptome by paired-end RNA sequencing. , 2011, Genome research.

[29]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[30]  Gunnar Rätsch,et al.  Accurate splice site prediction using support vector machines , 2007, BMC Bioinformatics.

[31]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[32]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..