论文信息 - Generalised empirical Bayesian methods for discovery of differential data in high-throughput biology

Generalised empirical Bayesian methods for discovery of differential data in high-throughput biology

Motivation High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a ‘large P, small n’ setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. Results We present here a generalised method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. Availability The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html. Contact tjh48@cam.ac.uk

Thomas J. Hardcastle

[1] David A. Orlando,et al. Revisiting Global Gene Expression Analysis , 2012, Cell.

[2] Nicholas T. Ingolia,et al. Genome-Wide Analysis in Vivo of Translation with Nucleotide Resolution Using Ribosome Profiling , 2009, Science.

[3] Raymond K. Auerbach,et al. An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[4] Israel Steinfeld,et al. BMC Bioinformatics BioMed Central , 2008 .

[5] J. Ibrahim,et al. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions , 2011, Genome Biology.

[6] Rafael A Irizarry,et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[7] C. Mason,et al. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages , 2014, Nature Communications.

[8] Bing Ren,et al. Discovery and Annotation of Functional Chromatin Signatures in the Human Genome , 2009, PLoS Comput. Biol..

[9] Li Wang,et al. Integrating Multi-Omics for Uncovering the Architecture of Cross-Talking Pathways in Breast Cancer , 2014, PloS one.

[10] C. Mason,et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[11] I. Johnstone,et al. Statistical challenges of high-dimensional data , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[12] Sandrine Dudoit,et al. GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[13] M. Evans,et al. Methods for Approximating Integrals in Statistics with Special Emphasis on Bayesian Integration Problems , 1995 .

[14] ENCODEConsortium,et al. An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[15] Charlotte Soneson,et al. A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[16] Robert Nadon,et al. Comparison of small n statistical tests of differential expression applied to microarrays , 2009, BMC Bioinformatics.

[17] Sang Yup Lee,et al. Comparative multi-omics systems analysis of Escherichia coli strains B and K-12 , 2012, Genome Biology.

[18] Gordon K Smyth,et al. Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[19] Nicola Zamboni,et al. High-throughput quantitative metabolomics: workflow for cultivation, quenching, and analysis of yeast in a multiwell format. , 2009, Analytical chemistry.

[20] John R Yates,et al. Mass spectrometry in high-throughput proteomics: ready for the big time , 2010, Nature Methods.

[21] Kiyoshi Masuda,et al. General RBP expression in human tissues as a function of age , 2012, Ageing Research Reviews.

[22] W. Huber,et al. which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[23] N. Manley,et al. An evolutionary perspective on the mechanisms of immunosenescence. , 2009, Trends in immunology.