Empirical Bayesian analysis of paired high-throughput sequencing data with a beta-binomial distribution

BackgroundPairing of samples arises naturally in many genomic experiments; for example, gene expression in tumour and normal tissue from the same patients. Methods for analysing high-throughput sequencing data from such experiments are required to identify differential expression, both within paired samples and between pairs under different experimental conditions.ResultsWe develop an empirical Bayesian method based on the beta-binomial distribution to model paired data from high-throughput sequencing experiments. We examine the performance of this method on simulated and real data in a variety of scenarios. Our methods are implemented as part of the RbaySeq package (versions 1.11.6 and greater) available from Bioconductor (http://www.bioconductor.org).ConclusionsWe compare our approach to alternatives based on generalised linear modelling approaches and show that our method offers significant gains in performance on simulated data. In testing on real data from oral squamous cell carcinoma patients, we discover greater enrichment of previously identified head and neck squamous cell carcinoma associated gene sets than has previously been achieved through a generalised linear modelling approach, suggesting that similar gains in performance may be found in real data. Our methods thus show real and substantial improvements in analyses of high-throughput sequencing data from paired samples.

[1]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[2]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[3]  R H Hruban,et al.  Gene expression profiles in normal and cancer cells. , 1997, Science.

[4]  H. Kuo,et al.  The Evolving Transcriptome of Head and Neck Squamous Cell Carcinoma: A Systematic Review , 2008, PloS one.

[5]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[6]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[7]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[8]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[9]  M. Evans,et al.  Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems , 1995 .

[10]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[11]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[12]  M. C. Jones The Minimax Distribution : A Beta-Type Distribution With Some Tractability Advantages , 2007 .

[13]  Crispin J. Miller,et al.  Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. , 2007, Cancer research.

[14]  D. Rickman,et al.  Prediction of future metastasis and molecular characterization of head and neck squamous-cell carcinoma based on transcriptome and genome analysis by microarrays , 2008, Oncogene.

[15]  Marco Beccuti,et al.  Optimizing a Massive Parallel Sequencing Workflow for Quantitative miRNA Expression Analysis , 2012, PloS one.

[16]  David I. Smith,et al.  Tumor Transcriptome Sequencing Reveals Allelic Expression Imbalances Associated with Copy Number Alterations , 2010, PloS one.

[17]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[18]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[19]  Fred A. Wright,et al.  A powerful and flexible approach to the analysis of RNA sequence count data , 2011, Bioinform..

[20]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[21]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[22]  Olivier Poch,et al.  Identification of genes associated with tumorigenesis and metastatic potential of hypopharyngeal cancer by microarray analysis , 2004, Oncogene.

[23]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.