VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis

BackgroundRNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts.ResultsUsing the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses.ConclusionsVIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.

[1]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[2]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[3]  M. Markatou,et al.  Evaluation of Methods in Removing Batch Effects on RNA-seq Data , 2016 .

[4]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[5]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[6]  Andrew E. Jaffe,et al.  Bioinformatics Applications Note Gene Expression the Sva Package for Removing Batch Effects and Other Unwanted Variation in High-throughput Experiments , 2022 .

[7]  Dapeng Wang,et al.  hppRNA - a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples , 2017, Briefings Bioinform..

[8]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[9]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[10]  Denis C. Bauer,et al.  A Comparative Study of Techniques for Differential Expression Analysis on RNA-Seq Data , 2014, bioRxiv.

[11]  Olaf Wolkenhauer,et al.  TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation , 2016, BMC Bioinformatics.

[12]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[13]  E. Birney,et al.  Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans , 2010, Science.

[14]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[15]  Jie Quan,et al.  QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization , 2015, BMC Genomics.

[16]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[17]  Ting-Fung Chan,et al.  ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution , 2013, Bioinform..

[18]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[19]  Dave Josephsen,et al.  Monitoring with Ganglia , 2012 .

[20]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[21]  James B. Brown,et al.  Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions , 2009, Genome Biology.

[22]  Donna Neuberg,et al.  The Public Repository of Xenografts Enables Discovery and Randomized Phase II-like Trials in Mice. , 2016, Cancer cell.

[23]  Huei-Chung Huang,et al.  Differential Expression Analysis for RNA-Seq: An Overview of Statistical Methods and Computational Software , 2015, Cancer informatics.

[24]  Sven Rahmann,et al.  Genome analysis , 2022 .

[25]  Timothy L. Tickle,et al.  STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq , 2017, bioRxiv.

[26]  Zhongming Zhao,et al.  VirusFinder: Software for Efficient and Accurate Detection of Viruses and Their Integration Sites in Host Genomes through Next Generation Sequencing Data , 2013, PloS one.

[27]  Guangchuang Yu,et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. , 2012, Omics : a journal of integrative biology.

[28]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[29]  John N. Weinstein,et al.  VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue , 2013, Bioinform..

[30]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[31]  William Stafford Noble,et al.  Global mapping of protein-DNA interactions in vivo by digital genomic footprinting , 2009, Nature Methods.

[32]  Jun S. Liu,et al.  Landscape of tumor-infiltrating T cell repertoire of human cancers , 2016, Nature Genetics.

[33]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[34]  Jun S. Liu,et al.  Comprehensive analyses of tumor immunity: implications for cancer immunotherapy , 2016, Genome Biology.

[35]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[36]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[37]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[38]  Charless C. Fowlkes,et al.  Three-dimensional morphology and gene expression in the Drosophila blastoderm at cellular resolution I: data acquisition pipeline , 2006, Genome Biology.

[39]  Z. Khatoon,et al.  Introduction to RNA‐Seq and its Applications to Drug Discovery and Development , 2014, Drug development research.

[40]  Julia A. Lasserre,et al.  Histone modification levels are predictive for gene expression , 2010, Proceedings of the National Academy of Sciences.

[41]  Weijun Luo,et al.  Pathview: an R/Bioconductor package for pathway-based data integration and visualization , 2013, Bioinform..

[42]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[43]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[44]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[45]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.