Testing Differential Expression in Nonoverlapping gene Pairs: a New Perspective for the Empirical Bayes Method

The currently practiced methods of significance testing in microarray gene expression profiling are highly unstable and tend to be very low in power. These undesirable properties are due to the nature of multiple testing procedures, as well as extremely strong and long-ranged correlations between gene expression levels. In an earlier publication, we identified a special structure in gene expression data that produces a sequence of weakly dependent random variables. This structure, termed the delta-sequence, lies at the heart of a new methodology for selecting differentially expressed genes in nonoverlapping gene pairs. The proposed method has two distinct advantages: (1) it leads to dramatic gains in terms of the mean numbers of true and false discoveries, and in the stability of the results of testing; and (2) its outcomes are entirely free from the log-additive array-specific technical noise. We demonstrate the usefulness of this approach in conjunction with the nonparametric empirical Bayes method. The proposed modification of the empirical Bayes method leads to significant improvements in its performance. The new paradigm arising from the existence of the delta-sequence in biological data offers considerable scope for future developments in this area of methodological research.

[1]  J. Booth,et al.  Resampling-Based Multiple Testing. , 1994 .

[2]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[3]  Alexander Gordon,et al.  Control of the mean number of false discoveries, Bonferroni and stability of multiple testing , 2007, 0709.0366.

[4]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[5]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[6]  Xing Qiu,et al.  Assessing stability of gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[7]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[8]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[9]  William Stafford Noble,et al.  The effect of replication on gene expression microarray experiments , 2003, Bioinform..

[10]  Andrei Yakovlev,et al.  Treating Expression Levels of Different Genes as a Sample in Microarray Data Analysis: Is it Worth a Risk? , 2006, Statistical applications in genetics and molecular biology.

[11]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[12]  Xing Qiu,et al.  Some Comments on Instability of False Discovery Rate Estimation , 2006, J. Bioinform. Comput. Biol..

[13]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[14]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[15]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[16]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[17]  Cheng Cheng,et al.  Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data , 2004, Statistical applications in genetics and molecular biology.

[18]  Andrei Yakovlev,et al.  Diverse correlation structures in gene expression data and their utility in improving statistical inference , 2007, 0712.2130.

[19]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[20]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[21]  Cheng Cheng,et al.  An adaptive significance threshold criterion for massive multiple hypotheses testing , 2006, math/0610845.

[22]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[23]  P. Sen Robust Statistical Inference for High-Dimensional Data Models with Application to Genomics , 2006 .

[24]  Lev Klebanov,et al.  A permutation test motivated by microarray data analysis , 2006, Comput. Stat. Data Anal..