Optimal Tests Shrinking Both Means and Variances Applicable to Microarray Data Analysis

As a consequence of the “large p small n” characteristic for microarray data, hypothesis tests based on individual genes often result in low average power. There are several proposed tests that attempt to improve power. Among these, the FS test that was developed using the concept of James-Stein shrinkage to estimate the variances showed a striking average power improvement. In this paper, we establish a framework in which we model the key parameters with a distribution to find an optimal Bayes test which we call the MAP test (where MAP stands for Maximum Average Power). Under this framework, the FS test can be derived as an empirical Bayes test approximating the MAP test corresponding to modeling the variances. By modeling both the means and the variances with a distribution, a MAP statistic is derived which is optimal in terms of average power but is computationally intensive. An empirical Bayes test called the FSS test is derived as an approximation to the MAP tests and can be computed instantaneously. The FSS statistic shrinks both the means and the variances and has numerically identical average power to the MAP tests. Much numerical evidence is presented in this paper that shows that the proposed test performs uniformly better in average power than the other tests in the literature, including the classical F test, the FS test, the test of Wright and Simon, the moderated t-test, SAM, Efron's t test, the B-statistic and Storey's optimal discovery procedure. A theory is established which indicates that the proposed test is optimal in power when controlling the false discovery rate (FDR).

[1]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[3]  Jeffrey T Leek,et al.  The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. , 2007, Biostatistics.

[4]  John D. Storey A direct approach to false discovery rates , 2002 .

[5]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[6]  B. Efron,et al.  Limiting the Risk of Bayes and Empirical Bayes Estimators—Part II: The Empirical Bayes Case , 1972 .

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  John D. Storey The optimal discovery procedure: a new approach to simultaneous significance testing , 2007 .

[9]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[10]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[11]  Tiejun Tong,et al.  Shrinkage‐based Diagonal Discriminant Analysis and Its Applications in High‐Dimensional Data , 2009, Biometrics.

[12]  Peng Liu,et al.  Quick calculation for sample size while controlling false discovery rate with application to microarray analysis , 2007, Bioinform..

[13]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[14]  Raphael Gottardo,et al.  Flexible empirical Bayes models for differential gene expression , 2007, Bioinform..

[15]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[16]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[17]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[18]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[19]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[20]  Hui-Nien Hung,et al.  Maximum Average-Power (MAP) Tests , 2007 .

[21]  Tiejun Tong,et al.  Optimal Shrinkage Estimation of Variances With Applications to Microarray Data Analysis , 2007 .

[22]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.