Comparative evaluation of isoform-level gene expression estimation algorithms for RNA-seq and exon-array platforms

Abstract Given that the majority of multi-exon genes generate diverse functional products, it is important to evaluate expression at the isoform level. Previous studies have demonstrated strong gene-level correlations between RNA sequencing (RNA-seq) and microarray platforms, but have not studied their concordance at the isoform level. We performed transcript abundance estimation on raw RNA-seq and exon-array expression profiles available for common glioblastoma multiforme samples from The Cancer Genome Atlas using different analysis pipelines, and compared both the isoform- and gene-level expression estimates between programs and platforms. The results showed better concordance between RNA-seq/exon-array and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) platforms for fold change estimates than for raw abundance estimates, suggesting that fold change normalization against a control is an important step for integrating expression data across platforms. Based on RT-qPCR validations, eXpress and Multi-Mapping Bayesian Gene eXpression (MMBGX) programs achieved the best performance for RNA-seq and exon-array platforms, respectively, for deriving the isoform-level fold change values. While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for the subset of transcripts that are highly variable across the samples. eXpress appears to be most successful in discriminating lowly expressed transcripts, but IsoformEx and RSEM correlate more strongly with MMBGX for highly expressed transcripts. The results also reinforce how potentially important isoform-level expression changes can be masked by gene-level estimates, and demonstrate that exon arrays yield comparable results to RNA-seq for evaluating isoform-level expression changes.

[1]  A. Brunati,et al.  MBNL142 and MBNL143 gene isoforms, overexpressed in DM1-patient muscle, encode for nuclear proteins interacting with Src family kinases , 2013, Cell Death and Disease.

[2]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[3]  Antti Honkela,et al.  Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability , 2013, PloS one.

[4]  M. Gerstein,et al.  What is a gene, post-ENCODE? History and updated definition. , 2007, Genome research.

[5]  Luke Macyszyn,et al.  Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes , 2014, Nucleic acids research.

[6]  Karine Tremblay,et al.  High-throughput quantification of splicing isoforms. , 2010, RNA.

[7]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[8]  Harry Zuzan,et al.  Heritability of alternative splicing in the human genome. , 2007, Genome research.

[9]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[10]  Fabian Birzele,et al.  CD44 Isoform Status Predicts Response to Treatment with Anti-CD44 Antibody in Cancer Patients , 2015, Clinical Cancer Research.

[11]  Naoko Okumura,et al.  Alternative splicings on p53, BRCA1 and PTEN genes involved in breast cancer. , 2011, Biochemical and biophysical research communications.

[12]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[13]  T. Maniatis,et al.  An RNA-Sequencing Transcriptome and Splicing Database of Glia, Neurons, and Vascular Cells of the Cerebral Cortex , 2014, The Journal of Neuroscience.

[14]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[15]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[16]  Kai Li,et al.  Targeted exploration and analysis of large cross-platform human transcriptomic compendia , 2015, Nature Methods.

[17]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[18]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[19]  Magnus Rattray,et al.  puma 3.0: improved uncertainty propagation methods for gene and transcript expression analysis , 2013, BMC Bioinformatics.

[20]  Geet Duggal,et al.  Accurate, fast, and model-aware transcript expression quantification with Salmon , 2015 .

[21]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[22]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[23]  Hyunsoo Kim,et al.  the transcriptome diversity of cerebellar development Alternative transcription exceeds alternative splicing in generating Material Supplemental , 2011 .

[24]  Christopher W. J. Smith,et al.  Alternative splicing: global insights , 2010, The FEBS journal.

[25]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[26]  B. Oliver,et al.  Microarrays, deep sequencing and the true measure of the transcriptome , 2011, BMC Biology.

[27]  N. Shulzhenko,et al.  Specificity of alternative splice form detection using RT-PCR with a primer spanning the exon junction. , 2003, BioTechniques.

[28]  Tyson A. Clark,et al.  Genomewide Analysis of mRNA Processing in Yeast Using Splicing-Specific Microarrays , 2002, Science.

[29]  Jennifer A. Mitchell,et al.  Concordance between RNA-sequencing data and DNA microarray data in transcriptome analysis of proliferative and quiescent fibroblasts , 2015, Royal Society Open Science.

[30]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[31]  C. Orengo,et al.  A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat , 2014, Molecular pain.

[32]  Traver Hart,et al.  Finding the active genes in deep RNA-seq gene expression studies , 2013, BMC Genomics.

[33]  Mihaela Zavolan,et al.  Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data , 2015, Genome Biology.

[34]  Ching-Wei Chang,et al.  DAFS: a data-adaptive flag method for RNA-sequencing data to differentiate genes with low and high expression , 2014, BMC Bioinformatics.

[35]  Gil Ast,et al.  Alternative splicing and disease , 2008, RNA biology.

[36]  Alex Lewin,et al.  MMBGX: a method for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays , 2009, Nucleic acids research.

[37]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[38]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[39]  O. Monni,et al.  Comprehensive exon array data processing method for quantitative analysis of alternative spliced variants , 2011, Nucleic acids research.

[40]  A. Bittner,et al.  Comparison of RNA-Seq and Microarray in Transcriptome Profiling of Activated T Cells , 2014, PloS one.

[41]  Robert Patro,et al.  Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms , 2013, ArXiv.

[42]  D. Levy,et al.  A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease , 2012, BMC Medical Genomics.

[43]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[44]  Wing Hung Wong,et al.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq , 2009, Bioinform..

[45]  Hyunsoo Kim,et al.  IsoformEx: isoform level gene expression estimation using weighted non-negative least squares from mRNA-Seq data , 2011, BMC Bioinformatics.

[46]  Jiang Li,et al.  Large Scale Comparison of Gene Expression Levels by Microarrays and RNAseq Using TCGA Data , 2013, PloS one.

[47]  Shihao Shen,et al.  MADS+: discovery of differential splicing events from Affymetrix exon junction array data , 2009, Bioinform..

[48]  W. Xiao,et al.  RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays , 2015, Scientific Reports.

[49]  M. Wilkins,et al.  Whole Transcriptome Sequencing Reveals Gene Expression and Splicing Differences in Brain Regions Affected by Alzheimer's Disease , 2011, PloS one.

[50]  Lior Pachter,et al.  Near-optimal RNA-Seq quantification , 2015, ArXiv.

[51]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[52]  Stephen A Bustin,et al.  Why the need for qPCR publication guidelines?--The case for MIQE. , 2010, Methods.

[53]  R. Kream,et al.  Comparing Bioinformatic Gene Expression Profiling Methods: Microarray and RNA-Seq , 2014, Medical science monitor basic research.

[54]  Dan Wang,et al.  A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species , 2010, Nucleic Acids Res..

[55]  J. Bourdon,et al.  p53 Isoforms: An Intracellular Microprocessor? , 2011, Genes & cancer.

[56]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[57]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[58]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[59]  Leming Shi,et al.  Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneys. , 2011, Chemical research in toxicology.

[60]  Scott M. Williams,et al.  Increased variance in germline allele-specific expression of APC associates with colorectal cancer. , 2012, Gastroenterology.

[61]  A. Casamayor,et al.  Assessing differential expression measurements by highly parallel pyrosequencing and DNA microarrays: a comparative study. , 2013, Omics : a journal of integrative biology.