EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments

Motivation: Messenger RNA expression is important in normal development and differentiation, as well as in manifestation of disease. RNA-seq experiments allow for the identification of differentially expressed (DE) genes and their corresponding isoforms on a genome-wide scale. However, statistical methods are required to ensure that accurate identifications are made. A number of methods exist for identifying DE genes, but far fewer are available for identifying DE isoforms. When isoform DE is of interest, investigators often apply gene-level (count-based) methods directly to estimates of isoform counts. Doing so is not recommended. In short, estimating isoform expression is relatively straightforward for some groups of isoforms, but more challenging for others. This results in estimation uncertainty that varies across isoform groups. Count-based methods were not designed to accommodate this varying uncertainty, and consequently, application of them for isoform inference results in reduced power for some classes of isoforms and increased false discoveries for others. Results: Taking advantage of the merits of empirical Bayesian methods, we have developed EBSeq for identifying DE isoforms in an RNA-seq experiment comparing two or more biological conditions. Results demonstrate substantially improved power and performance of EBSeq for identifying DE isoforms. EBSeq also proves to be a robust approach for identifying DE genes. Availability and implementation: An R package containing examples and sample datasets is available at http://www.biostat.wisc.edu/

[1]  Mike J. Mason,et al.  Induced pluripotent stem cells and embryonic stem cells are distinguished by gene expression signatures. , 2009, Cell stem cell.

[2]  Gary A. Churchill,et al.  Estimating p-values in small microarray experiments , 2007, Bioinform..

[3]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[4]  Michael J. Ziller,et al.  Reference Maps of Human ES and iPS Cell Variation Enable High-Throughput Characterization of Pluripotent Cell Lines , 2011, Cell.

[5]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[6]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[7]  W. Huber,et al.  Detecting differential usage of exons from RNA-seq data , 2012, Genome research.

[8]  I. Soreca,et al.  Mouse ES cells overexpressing DNMT1 produce abnormal neurons with upregulated NMDA/NR1 subunit. , 2011, Differentiation; research in biological diversity.

[9]  Michael Boutros,et al.  The head-regeneration transcriptome of the planarian Schmidtea mediterranea , 2011, Genome Biology.

[10]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[11]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[12]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[13]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[14]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[15]  Ion I. Mandoiu,et al.  Estimation of alternative splicing isoform frequencies from RNA-Seq data , 2010, Algorithms for Molecular Biology.

[16]  Fred A. Wright,et al.  A powerful and flexible approach to the analysis of RNA sequence count data , 2011, Bioinform..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[19]  Jennifer M. Bolin,et al.  Highly consistent, fully representative mRNA-Seq libraries from ten nanograms of total RNA. , 2010, BioTechniques.

[20]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[21]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[22]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[23]  J. G. Patton,et al.  Alternative splicing in the control of gene expression. , 1989, Annual review of genetics.

[24]  Paul A. Khavari,et al.  DNMT1 Maintains Progenitor Function in Self-Renewing Somatic Tissue , 2010, Nature.

[25]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[26]  Eric T. Wang,et al.  Alternative Isoform Regulation in Human Tissue Transcriptomes , 2008, Nature.

[27]  Jennifer M. Bolin,et al.  Proteomic and phosphoproteomic comparison of human ES and iPS cells , 2011, Nature Methods.

[28]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[29]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[30]  Brian E. Howard,et al.  Towards reliable isoform quantification using RNA-SEQ data , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[31]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..

[32]  S. Stamm,et al.  Function of Alternative Splicing , 2004 .

[33]  Joseph P Dunham,et al.  Somatic sex-specific transcriptome differences in Drosophila revealed by whole transcriptome sequencing , 2011, BMC Genomics.

[34]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[35]  M. Gould,et al.  Congenic rats reveal three independent Copenhagen alleles within the Mcs1 quantitative trait locus that confer resistance to mammary cancer. , 2003, Cancer research.

[36]  Bradley R. Cairns,et al.  Zebra Fish Dnmt1 and Suv39h1 Regulate Organ-Specific Terminal Differentiation during Development , 2006, Molecular and Cellular Biology.

[37]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[38]  Mark Gerstein,et al.  IQSeq: Integrated Isoform Quantification Analysis Based on Next-Generation Sequencing , 2012, PloS one.

[39]  Richard A Young,et al.  Chromatin structure and gene expression programs of human embryonic and induced pluripotent stem cells. , 2010, Cell stem cell.

[40]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[41]  Jun S. Song,et al.  Incomplete DNA methylation underlies a transcriptional memory of somatic cells in human iPS cells , 2011, Nature Cell Biology.

[42]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[43]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[44]  Yufeng Liu,et al.  FDM: a graph-based statistical method to detect differential transcription using RNA-seq data , 2011, Bioinform..