RNASeqR: an R package for automated two-group RNA-Seq analysis workflow

RNA-Seq analysis has revolutionized researchers’ understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-developed R packages for individual steps; however, there are few R/Bioconductor packages that integrate existing software tools into a comprehensive RNA-Seq analysis and provide fundamental end-to-end results in pure R environment so that researchers can quickly and easily get fundamental information in big sequencing data. To address this need, we have developed the open source R/Bioconductor package, RNASeqR. It allows users to run an automated RNA-Seq analysis with only six steps, producing essential tabular and graphical results for further biological interpretation. The features of RNASeqR include: six-step analysis, comprehensive visualization, background execution version, and the integration of both R and command-line software. RNASeqR provides fast, light-weight, and easy-to-run RNA-Seq analysis pipeline in pure R environment. It allows users to efficiently utilize popular software tools, including both R/Bioconductor and command-line tools, without predefining the resources or environments. RNASeqR is freely available for Linux and macOS operating systems from Bioconductor (https://bioconductor.org/packages/release/bioc/html/RNASeqR.html).

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Michael Q. Zhang,et al.  OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds , 2013, Nucleic acids research.

[3]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[4]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[5]  Dong-Hyung Cho,et al.  A nineteen gene‐based risk score classifier predicts prognosis of colorectal cancer patients , 2014, Molecular oncology.

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[8]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[9]  Florian Hahne,et al.  QuasR: quantification and annotation of short reads in R , 2015, Bioinform..

[10]  Olaf Wolkenhauer,et al.  TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation , 2016, BMC Bioinformatics.

[11]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[12]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[13]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[14]  Oliver Hofmann,et al.  bcbioRNASeq: R package for bcbio RNA-seq analysis , 2017, F1000Research.

[15]  Renan Valieris,et al.  Bioconda: sustainable and comprehensive software distribution for the life sciences , 2018, Nature Methods.

[16]  Andrew Johnston,et al.  Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms , 2014, The Journal of investigative dermatology.

[17]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[18]  Steffi Oesterreich,et al.  Discovery of naturally occurring ESR1 mutations in breast cancer cell lines modelling endocrine resistance , 2017, Nature Communications.

[19]  Dapeng Wang,et al.  hppRNA - a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples , 2017, Briefings Bioinform..

[20]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[21]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[22]  Alexandru I. Tomescu,et al.  A novel min-cost flow method for estimating transcript expression with RNA-Seq , 2013, BMC Bioinformatics.

[23]  Carlos Guzman,et al.  CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction , 2017, BMC Bioinformatics.

[24]  Alyssa C. Frazee,et al.  Flexible analysis of transcriptome assemblies with Ballgown , 2014, bioRxiv.

[25]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[26]  S. Eschrich,et al.  Characteristics and Validation Techniques for PCA-Based Gene-Expression Signatures , 2017, International journal of genomics.

[27]  Jinjie Cui,et al.  Lipidomics and RNA-Seq Study of Lipid Regulation in Aphis gossypii parasitized by Lysiphlebia japonica , 2017, Scientific Reports.

[28]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  Wolfgang Huber,et al.  Love MI, Huber W, Anders S.. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol 15: 550 , 2014 .

[31]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[32]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[33]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[34]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[35]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[36]  Thomas Girke,et al.  systemPipeR: NGS workflow and report generation environment , 2016, BMC Bioinformatics.

[37]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[38]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[39]  M. Wilkins,et al.  Transcriptome and network analyses in Saccharomyces cerevisiae reveal that amphotericin B and lactoferrin synergy disrupt metal homeostasis and stress response , 2017, Scientific Reports.

[40]  S. Salzberg,et al.  StringTie enables improved reconstruction of a transcriptome from RNA-seq reads , 2015, Nature Biotechnology.

[41]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[42]  Alyssa C. Frazee,et al.  Ballgown bridges the gap between transcriptome assembly and expression analysis , 2015, Nature Biotechnology.

[43]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[44]  Paul F. Cliften,et al.  Base Calling, Read Mapping, and Coverage Analysis , 2015 .

[45]  Sven Rahmann,et al.  Genome analysis , 2022 .

[46]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[47]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[48]  Jie Quan,et al.  QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization , 2015, BMC Genomics.

[49]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[50]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[51]  Bo Li,et al.  VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis , 2018, BMC Bioinformatics.

[52]  Alexander Dobin,et al.  Mapping RNA‐seq Reads with STAR , 2015, Current protocols in bioinformatics.

[53]  Weijun Luo,et al.  Pathview: an R/Bioconductor package for pathway-based data integration and visualization , 2013, Bioinform..

[54]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[55]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[56]  S. Lewallen,et al.  Epidemiology in practice: case-control studies. , 1998, Community eye health.

[57]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[58]  Jeffrey T Leek,et al.  Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown , 2016, Nature Protocols.

[59]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[60]  Roman Valls Guimera,et al.  bcbio-nextgen: Automated, distributed next-gen sequencing pipeline , 2012 .