An empirical bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments

MOTIVATION Detection of differentially expressed genes is one of the major goals of microarray experiments. Pairwise comparison for each gene is not appropriate without controlling the overall (experimentwise) type 1 error rate. Dudoit et al. have advocated use of permutation-based step-down P-value adjustments to correct the observed significance levels for the individual (i.e. for each gene) two sample t-tests. RESULTS In this paper, we consider an ANOVA formulation of the gene expression levels corresponding to multiple tissue types. We provide resampling-based step-down adjustments to correct the observed significance levels for the individual ANOVA t-tests for each gene and for each pair of tissue type comparisons. More importantly, we introduce a novel empirical Bayes adjustment to the t-test statistics that can be incorporated into the step-down procedure. Using simulated data, we show that the empirical Bayes adjustment improved the sensitivity of detecting differentially expressed genes up to 16%, while maintaining a high level of specificity. This adjustment also reduces the false non-discovery rate to some degree at the cost of a modest increase in the false discovery rate. We illustrate our approach using a human colon cancer dataset consisting of oligonucleotide arrays of normal, adenoma and carcinoma cells. The number of genes with differential expression level declared statistically significant was about 50 when comparing normal to adenoma cells and about five when comparing adenoma to carcinoma cells. This list includes genes previously known to be associated with colon cancer as well as some novel genes. AVAILABILITY R code for the empirical Bayes adjustment and step-down P-value calculation via resampling are available from the supplementary web-site. SUPPLEMENTARY INFORMATION http://www.mathstat.gsu.edu/~matsnd/EB/supp.htm

[1]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[3]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[4]  M. Cohen,et al.  Guanylin mRNA expression in human intestine and colorectal adenocarcinoma. , 1998, Laboratory investigation; a journal of technical methods and pathology.

[5]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[6]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[7]  Russ B. Altman,et al.  Nonparametric methods for identifying differentially expressed genes in microarray data , 2002, Bioinform..

[8]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[9]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[10]  Robert Tibshirani,et al.  Statistical Significance for Genome-Wide Experiments , 2003 .

[11]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[12]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[13]  Pierre R. Bushel,et al.  STATISTICAL ANALYSIS OF A GENE EXPRESSION MICROARRAY EXPERIMENT WITH REPLICATION , 2002 .

[14]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[15]  J. Ibrahim,et al.  Bayesian Models for Gene Expression With DNA Microarray Data , 2002 .

[16]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[17]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[18]  D. Cox,et al.  Sequential prediction bounds for identifying differentially expressed genes in replicated microarray experiments , 2005 .

[19]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[20]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.