Correlation and Large-Scale Simultaneous Significance Testing

Large-scale hypothesis testing problems, with hundreds or thousands of test statistics zi to consider at once, have become familiar in current practice. Applications of popular analysis methods, such as false discovery rate techniques, do not require independence of the zi's, but their accuracy can be compromised in high-correlation situations. This article presents computational and theoretical methods for assessing the size and effect of correlation in large-scale testing. A simple theory leads to the identification of a single omnibus measure of correlation for the zi's order statistic. The theory relates to the correct choice of a null distribution for simultaneous significance testing and its effect on inference.

[1]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[2]  J. Hsu The Factor Analytic Approach to Simultaneous Inference in the General Linear Model , 1992 .

[3]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[4]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[5]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[6]  J. Hsu Multiple Comparisons: Theory and Methods , 1996 .

[7]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[8]  Peter H. Westfall,et al.  Multiple Testing of General Contrasts Using Logical Constraints and Correlations , 1997 .

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[11]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[12]  John D. Storey A direct approach to false discovery rates , 2002 .

[13]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[14]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[15]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[16]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[17]  Roger E Bumgarner,et al.  Cellular Gene Expression upon Human Immunodeficiency Virus Type 1 Infection of CD4+-T-Cell Lines , 2003, Journal of Virology.

[18]  Joseph P. Romano,et al.  Generalizations of the familywise error rate , 2005, math/0507420.

[19]  J. S. Rao,et al.  Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection , 2003 .

[20]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[21]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[22]  Xing Qiu,et al.  The effects of normalization on the correlation structure of microarray data , 2005, BMC Bioinformatics.

[23]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[24]  John D. Storey,et al.  The Optimal Discovery Procedure II: Applications to Comparative Microarray Experiments , 2005 .

[25]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[26]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[27]  Karuturi R. Krishna Murthy,et al.  Bias in the estimation of false discovery rate in microarray studies , 2005, Bioinform..

[28]  Bradley Efron,et al.  Local False Discovery Rates , 2005 .

[29]  A. Owen Variance of the number of false discoveries , 2005 .

[30]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[31]  Alan E. Hubbard,et al.  Statistical Applications in Genetics and Molecular Biology Quantile-Function Based Null Distribution in Resampling Based Multiple Testing , 2011 .

[32]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.