A unified approach to false discovery rate estimation

BackgroundFalse discovery rate (FDR) methods play an important role in analyzing high-dimensional data. There are two types of FDR, tail area-based FDR and local FDR, as well as numerous statistical algorithms for estimating or controlling FDR. These differ in terms of underlying test statistics and procedures employed for statistical learning.ResultsA unifying algorithm for simultaneous estimation of both local FDR and tail area-based FDR is presented that can be applied to a diverse range of test statistics, including p-values, correlations, z- and t-scores. This approach is semipararametric and is based on a modified Grenander density estimator. For test statistics other than p-values it allows for empirical null modeling, so that dependencies among tests can be taken into account. The inference of the underlying model employs truncated maximum-likelihood estimation, with the cut-off point chosen according to the false non-discovery rate.ConclusionThe proposed procedure generalizes a number of more specialized algorithms and thus offers a common framework for FDR estimation consistent across test statistics and types of FDR. In comparative study the unified approach performs on par with the best competing yet more specialized alternatives. The algorithm is implemented in R in the "fdrtool" package, available under the GNU GPL from http://strimmerlab.org/software/fdrtool/ and from the R package archive CRAN.

[1]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[2]  H. Barnett A Theory of Mortality , 1968 .

[3]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[4]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[5]  U. Grenander On the theory of mortality measurement , 1956 .

[6]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[7]  Cheng Cheng,et al.  Robust estimation of the false discovery rate , 2006, Bioinform..

[8]  R. Tibshirani,et al.  Using specially designed exponential families for density estimation , 1996 .

[9]  Geoffrey J. McLachlan,et al.  A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays , 2006, Bioinform..

[10]  Thierry Moreau,et al.  A simple procedure for estimating the false discovery rate , 2005, Bioinform..

[11]  Hongyu Zhao,et al.  Nonparametric estimator of false discovery rate based on Bernšteǐn polynomials , 2008 .

[12]  B. Efron Robbins, Empirical Bayes, And Microarrays , 2001 .

[13]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[14]  B. Lindqvist,et al.  Estimating the proportion of true null hypotheses, with application to DNA microarray data , 2005 .

[15]  Per Broberg,et al.  A comparative review of estimates of the proportion unchanged genes and the false discovery rate , 2005, BMC Bioinformatics.

[16]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[17]  B. Efron Size, power and false discovery rates , 2007, 0710.2245.

[18]  Jean-Jacques Daudin,et al.  Determination of the differentially expressed genes in microarray experiments using local FDR , 2004, BMC Bioinformatics.

[19]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[21]  John D. Storey A direct approach to false discovery rates , 2002 .

[22]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[23]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[24]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[25]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[26]  David J. Spiegelhalter,et al.  Microarrays, Empirical Bayes and the Two-Groups Model. Comment. , 2008 .

[27]  Jean-Jacques Daudin,et al.  A semi-parametric approach for mixture models: Application to local false discovery rate estimation , 2007, Comput. Stat. Data Anal..

[28]  T. Cai,et al.  Estimating the Null and the Proportion of Nonnull Effects in Large-Scale Multiple Comparisons , 2006, math/0611108.

[29]  C. Bonferroni Il calcolo delle assicurazioni su gruppi di teste , 1935 .

[30]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[31]  S. Scheid,et al.  A stochastic downhill search algorithm for estimating the local false discovery rate , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..