Bayesian Hierarchical Model for Differential Gene Expression Using RNA-Seq Data

We introduce model-based Bayesian inference to screen for differentially expressed genes based on RNA-seq data. RNA-seq is a high-throughput next-generation sequencing application that can be used to measure the expression of messenger RNA. We propose a Bayesian hierarchical model to implement coherent, fast and robust inference, focusing on differential gene expression experiments, i.e., experiments carried out to learn about differences in gene expression under two biologic conditions. The proposed model exploits available position-specific read counts, minimizing required data preprocessing and making maximum use of available information. Moreover, it includes mechanisms to automatically discount outliers at the level of positions within genes. The method combines gene-level information across replicates, and reports coherent posterior probabilities of differential expression at the gene level. An implementation as a public domain R package is available.

[1]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[2]  Hongkai Ji,et al.  Analyzing 'omics data using hierarchical models , 2010, Nature Biotechnology.

[3]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[4]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[5]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[6]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[7]  W. Wong,et al.  Modeling non-uniformity in short-read rates in RNA-Seq data , 2010, Genome Biology.

[8]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[9]  C. Robert,et al.  A Mixture Approach to Bayesian Goodness of Fit , 2002 .

[10]  Peter Müller,et al.  On Differential Gene Expression Using RNA-Seq Data , 2011, Cancer informatics.

[11]  Zhijin Wu,et al.  Empirical bayes analysis of sequencing-based transcriptional profiling without replicates , 2010, BMC Bioinformatics.

[12]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[13]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[14]  Nicholas T. Ingolia,et al.  Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[15]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[16]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[17]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[18]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[19]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[20]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[21]  Patrick M. Tarwater,et al.  Codon choice in genes depends on flanking sequence information—implications for theoretical reverse translation , 2008, Nucleic acids research.

[22]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[23]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[24]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[25]  Piotr J. Balwierz,et al.  Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data , 2009, Genome Biology.

[26]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[27]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[28]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[29]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[30]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[31]  Li Deng,et al.  Overdispersed logistic regression for SAGE: Modelling multiple groups and covariates , 2004, BMC Bioinformatics.

[32]  Schraga Schwartz,et al.  Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads , 2011, PloS one.

[33]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[34]  Stefan Schwarz,et al.  A Field Guide to Pandemic, Epidemic and Sporadic Clones of Methicillin-Resistant Staphylococcus aureus , 2011, PloS one.

[35]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.