EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments

MOTIVATION Messenger RNA expression is important in normal development and differentiation, as well as in manifestation of disease. RNA-seq experiments allow for the identification of differentially expressed (DE) genes and their corresponding isoforms on a genome-wide scale. However, statistical methods are required to ensure that accurate identifications are made. A number of methods exist for identifying DE genes, but far fewer are available for identifying DE isoforms. When isoform DE is of interest, investigators often apply gene-level (count-based) methods directly to estimates of isoform counts. Doing so is not recommended. In short, estimating isoform expression is relatively straightforward for some groups of isoforms, but more challenging for others. This results in estimation uncertainty that varies across isoform groups. Count-based methods were not designed to accommodate this varying uncertainty, and consequently, application of them for isoform inference results in reduced power for some classes of isoforms and increased false discoveries for others. RESULTS Taking advantage of the merits of empirical Bayesian methods, we have developed EBSeq for identifying DE isoforms in an RNA-seq experiment comparing two or more biological conditions. Results demonstrate substantially improved power and performance of EBSeq for identifying DE isoforms. EBSeq also proves to be a robust approach for identifying DE genes. AVAILABILITY AND IMPLEMENTATION An R package containing examples and sample datasets is available at http://www.biostat.wisc.edu/kendzior/EBSEQ/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  Michael J. Ziller,et al.  Reference Maps of Human ES and iPS Cell Variation Enable High-Throughput Characterization of Pluripotent Cell Lines , 2011, Cell.

[4]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[5]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[6]  Mark Gerstein,et al.  IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing , 2012, PloS one.

[7]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[8]  Fred A. Wright,et al.  A powerful and flexible approach to the analysis of RNA sequence count data , 2011, Bioinform..

[9]  Yufeng Liu,et al.  FDM: a graph-based statistical method to detect differential transcription using RNA-seq data , 2011, Bioinform..

[10]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[11]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[12]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[13]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[14]  Jennifer M. Bolin,et al.  Proteomic and phosphoproteomic comparison of human ES and iPS cells , 2011, Nature Methods.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[17]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[18]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[19]  Jun S. Song,et al.  Incomplete DNA methylation underlies a transcriptional memory of somatic cells in human iPS cells , 2011, Nature Cell Biology.

[20]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[21]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[22]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[23]  J. G. Patton,et al.  Alternative splicing in the control of gene expression. , 1989, Annual review of genetics.

[24]  Brian E. Howard,et al.  Towards reliable isoform quantification using RNA-SEQ data , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[25]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[26]  S. Stamm,et al.  Function of alternative splicing. , 2013, Gene.

[27]  Michael Boutros,et al.  The head-regeneration transcriptome of the planarian Schmidtea mediterranea , 2011, Genome Biology.

[28]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[29]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[30]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[31]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..