A Survey of Bioinformatics-Based Tools in RNA-Sequencing (RNA-Seq) Data Analysis

The capability of next-generation sequencing can be understood by one of its techniques like RNA sequencing (RNA-Seq) that deals with the transcriptome complexity in a powerful and cost-effective way. This technique has emerged as a revolutionary tool with high sensitivity and accuracy over old techniques. Additionally, this technique also gives unprecedented ability to detect novel mRNA transcripts as well as ncRNAs and analyze alternative splicing. Being a high-throughput sequencing technique, it poses a great demand for bioinformatics-based analysis of the generated data. Here, we explain how RNA-Seq data can be analyzed, discuss its challenges, and provide an overview of the data analysis methods/tools. We discuss strategies for quality check, mapping, and differential expression in transcriptomic data along with discussions on lately developed strategies for alternative splicing and isoform quantification. We also mention some useful R/Bioconductor packages for aforementioned tasks.

[1]  Han Liang,et al.  BM-Map: an efficient software package for accurately allocating multireads of RNA-sequencing data , 2012, BMC Genomics.

[2]  M. Faghihi,et al.  CANEapp: a user-friendly application for automated next generation transcriptomic data analysis , 2016, BMC Genomics.

[3]  Jie Quan,et al.  QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization , 2015, BMC Genomics.

[4]  S. Sabunciyan,et al.  CLASS2: accurate and efficient splice variant annotation from RNA-seq reads , 2014, bioRxiv.

[5]  Zhong Jin,et al.  HSA: A Heuristic Splice Alignment Tool , 2013, BMC Systems Biology.

[6]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[7]  Olaf Wolkenhauer,et al.  TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation , 2016, BMC Bioinformatics.

[8]  Luis V. Santana-Quintero,et al.  HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis , 2014, PloS one.

[9]  Ole Schulz-Trieglaff,et al.  NxTrim: optimized trimming of Illumina mate pair reads , 2014, bioRxiv.

[10]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[11]  Steven J. M. Jones,et al.  JAGuaR: Junction Alignments to Genome for RNA-Seq Reads , 2014, PloS one.

[12]  Kenneth H. Buetow,et al.  Bioinformatics Applications Note Sequence Analysis Bambino: a Variant Detector and Alignment Viewer for Next-generation Sequencing Data in the Sam/bam Format , 2022 .

[13]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[14]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.

[15]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[16]  Fidel Ramírez,et al.  deepTools: a flexible platform for exploring deep-sequencing data , 2014, Nucleic Acids Res..

[17]  Wei-Chung Cheng,et al.  YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research , 2014, Nucleic Acids Res..

[18]  Ying Li,et al.  Measure transcript integrity using RNA-seq data , 2016, BMC Bioinformatics.

[19]  C. Tokheim,et al.  Identifying differential alternative splicing events from RNA sequencing data using RNASeq-MATS. , 2013, Methods in molecular biology.

[20]  Mehmet Deveci,et al.  mrSNP: Software to detect SNP effects on microRNA binding , 2014, BMC Bioinformatics.

[21]  Praveen Sethupathy,et al.  tDRmapper: challenges and solutions to mapping, naming, and quantifying tRNA-derived RNAs from human small RNA-sequencing data , 2015, BMC Bioinformatics.

[22]  Chris Williams,et al.  RNA-SeQC: RNA-seq metrics for quality control and process optimization , 2012, Bioinform..

[23]  Janet Kelso,et al.  leeHom: adaptor trimming and merging for Illumina sequencing reads , 2014, Nucleic acids research.

[24]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[25]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[26]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[27]  Yufei Huang,et al.  Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads , 2015, bioRxiv.

[28]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[29]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[30]  Antti Honkela,et al.  Fast and accurate approximate inference of transcript expression from RNA-seq data , 2014, Bioinform..

[31]  R. Shankar,et al.  miReader: Discovering Novel miRNAs in Species without Sequenced Genome , 2013, PloS one.

[32]  B. Tjaden,et al.  Computational analysis of bacterial RNA-Seq data , 2013, Nucleic acids research.

[33]  Wei-Chung Cheng,et al.  YM500: a small RNA sequencing (smRNA-seq) database for microRNA research , 2012, Nucleic Acids Res..

[34]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[35]  Alvis Brazma,et al.  A pipeline for RNA-seq data processing and quality assessment , 2011, Bioinform..

[36]  Carsten O. Daub,et al.  SAMStat: monitoring biases in next generation sequencing data , 2010, Bioinform..

[37]  K. Reinert,et al.  CIDANE: Comprehensive isoform discovery and abundance estimation , 2015, bioRxiv.

[38]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[39]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[40]  M. Fedurco,et al.  BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies , 2006, Nucleic acids research.

[41]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[42]  Ekaterina Starostina,et al.  Cookiecutter: a tool for kmer-based read filtering and extraction , 2015, bioRxiv.

[43]  J. Lundeberg,et al.  The plasticity of the mammalian transcriptome. , 2010, Genomics.

[44]  André Fischer,et al.  Oasis: online analysis of small RNA deep sequencing data , 2015, Bioinform..

[45]  D. Clayton,et al.  Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing , 2009, Human molecular genetics.

[46]  Jui-Hung Hung,et al.  PEAT: an intelligent and efficient paired-end sequencing adapter trimming algorithm , 2015, BMC Bioinformatics.

[47]  Anton J. Enright,et al.  Chimira: analysis of small RNA sequencing data and microRNA modifications , 2015, Bioinform..

[48]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[49]  Marcel H. Schulz,et al.  Probabilistic error correction for RNA sequencing , 2013, Nucleic acids research.

[50]  Zhiping Weng,et al.  Tailor: a computational framework for detecting non-templated tailing of small silencing RNAs , 2015, Nucleic acids research.

[51]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[52]  Kenneth P. Nephew,et al.  BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data , 2015, Bioinform..

[53]  Xiuzhen Huang,et al.  Bridger: a new framework for de novo transcriptome assembly using RNA-seq data , 2015, Genome Biology.

[54]  Dawei Li,et al.  The sequence and de novo assembly of the giant panda genome , 2010, Nature.

[55]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[56]  A. Conesa,et al.  Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package , 2015, Nucleic acids research.

[57]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[58]  Stephen Hartley,et al.  QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments , 2015, BMC Bioinformatics.

[59]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[60]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[61]  Sahar Al Seesi,et al.  Transcriptome assembly and quantification from Ion Torrent RNA-Seq data , 2013, BMC Genomics.

[62]  Hui Guo,et al.  MapView: visualization of short reads alignment on a desktop computer , 2009, Bioinform..

[63]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[64]  M. Axtell ShortStack: comprehensive annotation and quantification of small RNA genes. , 2013, RNA.

[65]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[66]  Liliana Florea,et al.  Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads , 2015, GigaScience.

[67]  Antonio Rinaldi,et al.  iMir: An integrated pipeline for high-throughput analysis of small non-coding RNA data obtained by smallRNA-Seq , 2013, BMC Bioinformatics.

[68]  Wei Yang,et al.  ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data , 2014, Bioinform..

[69]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[70]  B. Langmead,et al.  Aligning Short Sequencing Reads with Bowtie , 2010, Current protocols in bioinformatics.

[71]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[72]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[73]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[74]  Andrey V. Kartashov,et al.  BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data , 2014, Genome Biology.

[75]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[76]  J. S. Marron,et al.  BlackOPs: increasing confidence in variant detection through mappability filtering , 2013, Nucleic acids research.

[77]  David K. Gifford,et al.  Universal Count Correction for High-Throughput Sequencing , 2014, PLoS Comput. Biol..

[78]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[79]  S. Brisse,et al.  AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. , 2013, Genomics.

[80]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[81]  Leonardo Collado-Torres,et al.  Rail-RNA: Scalable analysis of RNA-seq splicing and coverage , 2015, bioRxiv.

[82]  Brandon Milholland,et al.  SMiRK: an Automated Pipeline for miRNA Analysis , 2015, Source journal of genomics.

[83]  Xuegong Zhang,et al.  mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data , 2015, Nature Communications.

[84]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[85]  Ram Vinay Pandey,et al.  ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research , 2016, BMC Bioinformatics.

[86]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[87]  Wei Wang,et al.  GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment , 2013, Bioinform..

[88]  D. Dressman,et al.  Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[89]  Peter Frommolt,et al.  QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation , 2015, BMC Genomics.

[90]  Ana Conesa,et al.  Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data , 2015, Bioinform..

[91]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[92]  Julia C. Engelmann,et al.  miRA: adaptable novel miRNA identification in plants using small RNA sequencing data , 2015, BMC Bioinformatics.

[93]  Jihoon Kim,et al.  MAGI: a Node.js web service for fast microRNA-Seq analysis in a GPU infrastructure , 2014, Bioinform..

[94]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[95]  Li Song,et al.  CLASS: constrained transcript assembly of RNA-seq reads , 2013, BMC Bioinformatics.

[96]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[97]  Geet Duggal,et al.  Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment , 2015 .

[98]  Jun Wu,et al.  HTQC: a fast quality control toolkit for Illumina sequencing data , 2013, BMC Bioinformatics.

[99]  Shuifang Zhu,et al.  Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads , 2014, BMC Bioinformatics.

[100]  Yuk Yee Leung,et al.  CoRAL: predicting non-coding RNAs from small RNA-sequencing data , 2013, Nucleic acids research.

[101]  Peng Jiang,et al.  Quality control of single-cell RNA-seq by SinQC , 2016, Bioinform..

[102]  P. Tsonis,et al.  mirPRo–a novel standalone program for differential expression and variation analysis of miRNAs , 2015, Scientific Reports.

[103]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[104]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[105]  Akhilesh Pandey,et al.  miRge - A Multiplexed Method of Processing Small RNA-Seq Data to Determine MicroRNA Entropy , 2015, PloS one.

[106]  Mattia D'Antonio,et al.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application , 2015, BMC Genomics.

[107]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[108]  Steven J. M. Jones,et al.  Alternative expression analysis by RNA sequencing , 2010, Nature Methods.

[109]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[110]  Robert C. Thompson,et al.  NGSQC: cross-platform quality analysis pipeline for deep sequencing data , 2010, BMC Genomics.

[111]  M. Gill,et al.  Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data , 2013, PloS one.

[112]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[113]  Michael Hackenberg,et al.  sRNAtoolbox: an integrated collection of small RNA research tools , 2015, Nucleic Acids Res..

[114]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[115]  Thomas J. Hardcastle Discovery of methylation loci and analyses of differential methylation from replicated high-throughput sequencing data , 2015 .

[116]  Chad E. Niederhuth,et al.  From Gigabyte to Kilobyte: A Bioinformatics Protocol for Mining Large RNA-Seq Transcriptomics Data , 2015, PloS one.

[117]  Pengyuan Liu,et al.  deGPS is a powerful tool for detecting differential expression in RNA-sequencing studies , 2015, BMC Genomics.

[118]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[119]  J. Bähler,et al.  Cellular and Molecular Life Sciences REVIEW RNA-seq: from technology to biology , 2022 .

[120]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[121]  Kan Liu,et al.  BIGpre: A Quality Assessment Package for Next-Generation Sequencing Data , 2011, Genom. Proteom. Bioinform..

[122]  Simon A. A. Travers,et al.  QTrim: a novel tool for the quality trimming of sequence reads generated using the Roche/454 sequencing platform , 2014, BMC Bioinformatics.

[123]  Thomas J. Hardcastle,et al.  Identifying small interfering RNA loci from high-throughput sequencing data , 2012, Bioinform..

[124]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[125]  Zhifu Sun,et al.  CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data , 2014, BMC Genomics.

[126]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[127]  Jun Chen,et al.  A tool for RNA sequencing sample identity check , 2013, Bioinform..

[128]  Henry D. Priest,et al.  Genome-wide mapping of alternative splicing in Arabidopsis thaliana. , 2010, Genome research.

[129]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[130]  Andrea Acquaviva,et al.  isomiR-SEA: an RNA-Seq analysis tool for miRNAs/isomiRs expression level profiling and miRNA-mRNA interaction sites evaluation , 2016, BMC Bioinformatics.

[131]  Pearlly Yan,et al.  Quality Control for RNA-Seq (QuaCRS): An Integrated Quality Control Pipeline , 2014, Cancer informatics.

[132]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[133]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[134]  Anton J. Enright,et al.  Kraken: A set of tools for quality control and analysis of high-throughput sequence data , 2013, Methods.

[135]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..

[136]  Sun Kim,et al.  piClust: A density based piRNA clustering algorithm , 2014, Comput. Biol. Chem..

[137]  Robert Gentleman,et al.  ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data , 2009, Bioinform..

[138]  Yongchao Liu,et al.  CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding , 2014, PloS one.

[139]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[140]  Cathy H. Wu,et al.  Software for pre-processing Illumina next-generation sequencing short read sequences , 2014, Source Code for Biology and Medicine.

[141]  Sara Ballouz,et al.  AuPairWise: a method to estimate RNA-seq replicability through co-expression , 2016 .

[142]  Andrew Lonie,et al.  iSRAP – a one-touch research tool for rapid profiling of small RNA-seq data , 2015, Journal of extracellular vesicles.

[143]  Po-E Li,et al.  ADEPT, a dynamic next generation sequencing data error-detection program with trimming , 2016, BMC Bioinformatics.

[144]  Lin Liu,et al.  Comparison of Next-Generation Sequencing Systems , 2012, Journal of biomedicine & biotechnology.

[145]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[146]  W. Shi,et al.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote , 2013, Nucleic acids research.

[147]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[148]  C. Nelson,et al.  miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data , 2012, Nucleic acids research.