Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes

Stochastic dependence between gene expression levels in microarray data is of critical importance for the methods of statistical inference that resort to pooling test statistics across genes. The empirical Bayes methodology in the nonparametric and parametric formulations, as well as closely related methods employing a two-component mixture model, represent typical examples. It is frequently assumed that dependence between gene expressions (or associated test statistics) is sufficiently weak to justify the application of such methods for selecting differentially expressed genes. By applying resampling techniques to simulated and real biological data sets, we have studied a potential impact of the correlation between gene expression levels on the statistical inference based on the empirical Bayes methodology. We report evidence from these analyses that this impact may be quite strong, leading to a high variance of the number of differentially expressed genes. This study also pinpoints specific components of the empirical Bayes method where the reported effect manifests itself.

[1]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[2]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[3]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[4]  H. Finner,et al.  On the False Discovery Rate and Expected Type I Errors , 2001 .

[5]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[6]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[7]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[8]  Vladimir A. Kuznetsov,et al.  Distribution Associated with Stochastic Processes of Gene Expression in a Single Eukaryotic Cell , 2001, EURASIP J. Adv. Signal Process..

[9]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[10]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[11]  Christina Kendziorski,et al.  Parametric Empirical Bayes Methods for Microarrays , 2003 .

[12]  Mark J. van der Laan,et al.  Choice of a null distribution in resampling-based multiple testing , 2004 .

[13]  Sandrine Dudoit,et al.  Multiple Testing Procedures for Controlling Tail Probability Error Rates , 2004 .

[14]  John W. V. Storey The False Discovery Rate: A Bayesian Interpre-tation and the q-value , 2001 .

[15]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[16]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[17]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[18]  G A Whitmore,et al.  Models for microarray gene expression data , 2002, Journal of biopharmaceutical statistics.

[19]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[20]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[21]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[22]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[23]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[24]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[26]  H. Finner,et al.  Multiple hypotheses testing and expected number of type I. errors , 2002 .

[27]  Wei Chen,et al.  Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data , 2005, BMC Bioinformatics.

[28]  Rafael A. Irizarry,et al.  An R Package for Analyses of Affymetrix Oligonucleotide Arrays , 2003 .

[29]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[30]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[31]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[32]  Joseph P. Romano,et al.  Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .

[33]  Giovanni Parmigiani,et al.  Searching for differentially expressed gene combinations , 2005, Genome Biology.

[34]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[35]  M. J. van der Laan,et al.  Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives , 2004, Statistical applications in genetics and molecular biology.

[36]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[37]  John D. Storey A direct approach to false discovery rates , 2002 .

[38]  Thierry Moreau,et al.  A simple procedure for estimating the false discovery rate , 2005, Bioinform..

[39]  Magnus Rattray,et al.  Making sense of microarray data distributions , 2002, Bioinform..

[40]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[41]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[42]  J. Yang,et al.  Oligonucleotide Microarray Data Distribution And Normalization , 2002, JCIS.

[43]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[44]  B. Efron Robbins, Empirical Bayes, And Microarrays , 2001 .

[45]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[46]  Cheng Cheng,et al.  Improving false discovery rate estimation , 2004, Bioinform..