Size, power and false discovery rates

Modem scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr's, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of "significant" discoveries.

[1]  J. K. Lindsey,et al.  Comparison of Probability Distributions , 1974 .

[2]  J. K. Lindsey,et al.  Construction and Comparison of Statistical Models , 1974 .

[3]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[4]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[5]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[6]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[8]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Bradley Efron,et al.  Scales of Evidence for Model Selection: Fisher versus Jeffreys , 2001 .

[10]  Discussion of Bradley Efron and Alan Gous, "Scales of Evidence for Model Selection: Fisher versus Jeffreys" , 2001 .

[11]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[12]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[13]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[14]  John D. Storey A direct approach to false discovery rates , 2002 .

[15]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[16]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[17]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[18]  Roger E Bumgarner,et al.  Cellular Gene Expression upon Human Immunodeficiency Virus Type 1 Infection of CD4+-T-Cell Lines , 2003, Journal of Virology.

[19]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[20]  Wei Pan,et al.  A mixture model approach to detecting differentially expressed genes with microarray data , 2003, Functional & Integrative Genomics.

[21]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[22]  K. Pollard,et al.  Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data , 2003 .

[23]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[24]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[25]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[26]  Per Broberg,et al.  A new estimate of the proportion unchanged genes in a microarrayexperiment , 2004, Genome Biology.

[27]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[28]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[29]  I. Johnstone,et al.  Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences , 2004, math/0410088.

[30]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[31]  Jean-Jacques Daudin,et al.  Determination of the differentially expressed genes in microarray experiments using local FDR , 2004, BMC Bioinformatics.

[32]  P. Müller,et al.  A Bayesian mixture model for differential gene expression , 2005 .

[33]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[34]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[35]  Bradley Efron,et al.  Local False Discovery Rates , 2005 .

[36]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[37]  Roger E Bumgarner,et al.  Bayesian Robust Inference for Differential Gene Expression in Microarrays with Multiple Samples , 2004, Biometrics.

[38]  G. Gibson,et al.  Analysis of variance of microarray data. , 2006, Methods in enzymology.

[39]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .