Probe-level measurement error improves accuracy in detecting differential gene expression

MOTIVATION Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma.

[1]  Peter Spellucci,et al.  An SQP method for general nonlinear programs using only equality constrained subproblems , 1998, Math. Program..

[2]  Anne-Mette K. Hein,et al.  BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data. , 2005, Biostatistics.

[3]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[4]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[5]  Neil D. Lawrence,et al.  Reducing the variability in cDNA microarray image processing by Bayesian inference , 2004, Bioinform..

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Jean-Jacques Daudin,et al.  VarMixt: efficient variance modelling for the differential analysis of replicated gene expression data , 2005, Bioinform..

[8]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[9]  M Milo,et al.  A probabilistic model for the extraction of expression levels from oligonucleotide arrays. , 2003, Biochemical Society transactions.

[10]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[11]  Jean Yee Hwa Yang,et al.  Gene expression Identifying differentially expressed genes from microarray experiments via statistic synthesis , 2005 .

[12]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Neil D. Lawrence,et al.  A tractable probabilistic model for Affymetrix probe-level analysis across multiple chips , 2005, Bioinform..

[14]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[15]  Neil D. Lawrence,et al.  Accounting for probe-level noise in principal component analysis of microarray data , 2005, Bioinform..

[16]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[17]  Padhraic Smyth,et al.  Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance , 2004, Proc. Natl. Acad. Sci. USA.

[18]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[19]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[20]  Matthew J. Beal,et al.  Graphical Models and Variational Methods , 2001 .

[21]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[22]  E. Schuster,et al.  Increased power of microarray analysis by use of an algorithm based on a multivariate procedure , 2005, Bioinform..

[23]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[24]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.