Benchmarking of RNA-sequencing analysis workflows using whole-transcriptome RT-qPCR expression data

RNA-sequencing has become the gold standard for whole-transcriptome gene expression quantification. Multiple algorithms have been developed to derive gene counts from sequencing reads. While a number of benchmarking studies have been conducted, the question remains how individual methods perform at accurately quantifying gene expression levels from RNA-sequencing reads. We performed an independent benchmarking study using RNA-sequencing data from the well established MAQCA and MAQCB reference samples. RNA-sequencing reads were processed using five workflows (Tophat-HTSeq, Tophat-Cufflinks, STAR-HTSeq, Kallisto and Salmon) and resulting gene expression measurements were compared to expression data generated by wet-lab validated qPCR assays for all protein coding genes. All methods showed high gene expression correlations with qPCR data. When comparing gene expression fold changes between MAQCA and MAQCB samples, about 85% of the genes showed consistent results between RNA-sequencing and qPCR data. Of note, each method revealed a small but specific gene set with inconsistent expression measurements. A significant proportion of these method-specific inconsistent genes were reproducibly identified in independent datasets. These genes were typically smaller, had fewer exons, and were lower expressed compared to genes with consistent expression measurements. We propose that careful validation is warranted when evaluating RNA-seq based expression profiles for this specific gene set.

[1]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[2]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[3]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[4]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[5]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[6]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[7]  Dmitri D. Pervouchine,et al.  A benchmark for RNA-seq quantification pipelines , 2016, Genome Biology.

[8]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[9]  Mick Watson,et al.  Errors in RNA-Seq quantification affect genes of relevance to human disease , 2015, Genome Biology.

[10]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[11]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[12]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[13]  B. Frey,et al.  Revealing global regulatory features of mammalian alternative splicing using a quantitative microarray platform. , 2004, Molecular cell.

[14]  Yves Van de Peer,et al.  In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation , 2007, BMC Bioinformatics.

[15]  Xintao Wei,et al.  Erratum: A benchmark for RNA-seq quantification pipelines [Genome Biol. (2016), 17, 74], DOI: 10.1186/s13059-016-0940-1 , 2016 .

[16]  Michal J. Okoniewski,et al.  Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations , 2006, BMC Bioinformatics.

[17]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[18]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[19]  Minghui Wang,et al.  Efficient Test and Visualization of Multi-Set Intersections , 2015, Scientific Reports.

[20]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[21]  May D. Wang,et al.  Benchmarking RNA-Seq quantification tools , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[22]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[23]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[24]  Frank Speleman,et al.  A novel and universal method for microRNA RT-qPCR data normalization , 2009, Genome Biology.

[25]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.