Erratum: sample size determination for the false discovery rate

We have made corrections to the routines that were provided to implement Pounds and Cheng (2005) method to determine the sample size for a microarray experiment that uses the false discovery rate as the ultimate measure of statistical significance. Some routines in the original R and S-plus libraries did not properly account for differences between the definition of the noncentrality parameter in Equations (18) and (19) of Pounds and Cheng (2005) and the definition of the non-centrality parameter used by the internal R or S-plus function to evaluate the cumulative distribution function of the non-central F-distribution. A corrected version of the R routine library is now available for download from www.stjuderesearch.org/depts/biostats/fdrsampsize/index.html. To avoid confusion, all routines in the revised library use the definition of Equations (18) and (19) of Pounds and Cheng (2005). For a variety of settings involving a single two-sided test, the accuracy of the revised routine library has been checked by comparing the results of the avepow.oneway routine to the results of proc power in SAS and the built-in R function power.anova.test (Appendix). Corrected results of the simulation studies of Pounds and Cheng (2005) are reported in Tables 1 and 2 and Figure 1. Table 1 gives the simulation estimate of the expected value (EV) of the average power, i.e. the EV of the ratio D of the number of true discoveries to the number of false null hypotheses when sample size is determined using the true values of the proportion π of null hypotheses that are true and the effect size η of the false null hypotheses. The SD of D observed over 1000 simulation replications is also reported. In all but two settings, the determined sample size gives an estimate of the average power that is greater than or equal to the desired average power δ. In two settings, the simulation estimate of average power is slightly less than the desired average power. Table 2 reports the corrected results of a series of simulations that generated F-statistics for a background study with per-group sample size 4 from the assumed setting. The background F-statistics, instead of the actual effect size parameters, were used to determine sample with the method of Pounds and Cheng (2005). In all settings, the simulation estimate of the average power exceeded the desired average power δ. Figure 1 gives corrected results for the ‘real data simulation’ performed by resampling from a real data set that is described in section 4.2 of Pounds and Cheng (2005). The mean of the SPLOSH

[1]  Peng Liu,et al.  Quick calculation for sample size while controlling false discovery rate with application to microarray analysis , 2007, Bioinform..

[2]  Sin-Ho Jung,et al.  Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[3]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[4]  Cheng Cheng,et al.  Sample size determination for the false discovery rate , 2005, Bioinform..

[5]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[6]  David B. Allison,et al.  Power and sample size estimation in high dimensional biology , 2004 .

[7]  Sin-Ho Jung,et al.  Sample size calculation for multiple testing in microarray data analysis. , 2005, Biostatistics.

[8]  Sayan Mukherjee,et al.  Estimating Dataset Size Requirements for Classifying DNA Microarray Data , 2003, J. Comput. Biol..

[9]  Peng Liu,et al.  Gene expression: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis , 2007, Bioinform..

[10]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[11]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[12]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[13]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[14]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[15]  P W Lavori,et al.  Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. , 2000, Controlled clinical trials.

[16]  Fred A. Wright,et al.  Practical FDR-based sample size calculations in microarray experiments , 2005, Bioinform..

[17]  M. Radmacher,et al.  Design of studies using DNA microarrays , 2002, Genetic epidemiology.

[18]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[19]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[20]  Stat Pairs,et al.  Statistical Algorithms Description Document Genechip ® Array Design Data Outputs Stat Pairs Used , 2022 .

[21]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[22]  Sue-Jane Wang,et al.  Sample size for gene expression microarray experiments , 2005, Bioinform..

[23]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[24]  S. Hora Statistical Inference Based on Ranks , 1986 .

[25]  Cheng Cheng,et al.  Statistical Significance Threshold Criteria For Analysis of Microarray Gene Expression Data , 2004, Statistical applications in genetics and molecular biology.

[26]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[27]  Eli B. Roth Power and power , 1984, Nature.

[28]  Xiangqin Cui,et al.  How Many Mice and How Many Arrays? Replication in Mouse cDNA Microarray Experiments , 2004 .

[29]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[30]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  John D. Storey A direct approach to false discovery rates , 2002 .

[32]  D. Bloch,et al.  A simple method of sample size calculation for linear and logistic regression. , 1998, Statistics in medicine.

[33]  Cheng Cheng,et al.  Improving false discovery rate estimation , 2004, Bioinform..

[34]  J. Downing,et al.  Gene Expression Profiling of Pediatric Acute Myelogenous Leukemia Materials and Methods , 2022 .

[35]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[36]  P. Patnaik THE NON-CENTRAL χ2- AND F-DISTRIBUTIONS AND THEIR APPLICATIONS , 1949 .

[37]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.