Estimation and control of multiple testing error rates for microarray studies

The analysis of microarray data often involves performing a large number of statistical tests, usually at least one test per queried gene. Each test has a certain probability of reaching an incorrect inference; therefore, it is crucial to estimate or control error rates that measure the occurrence of erroneous conclusions in reporting and interpreting the results of a microarray study. In recent years, many innovative statistical methods have been developed to estimate or control various error rates for microarray studies. Researchers need guidance choosing the appropriate statistical methods for analysing these types of data sets. This review describes a family of methods that use a set of P-values to estimate or control the false discovery rate and similar error rates. Finally, these methods are classified in a manner that suggests the appropriate method for specific applications and diagnostic procedures that can identify problems in the analysis are described.

[1]  Sin-Ho Jung,et al.  Sample size for FDR-control in microarray data analysis , 2005, Bioinform..

[2]  Cheng Cheng,et al.  Sample size determination for the false discovery rate , 2005, Bioinform..

[3]  David B. Allison,et al.  A mixture model approach for the analysis of microarray gene expression data , 2002 .

[4]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[5]  Yongchao Ge Resampling-based Multiple Testing for Microarray Data Analysis , 2003 .

[6]  Xiangqin Cui,et al.  How Many Mice and How Many Arrays? Replication in Mouse cDNA Microarray Experiments , 2004 .

[7]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[8]  Claire Tilstone DNA microarrays: Vital statistics , 2003, Nature.

[9]  Huey-miin Hsueh,et al.  Comparison of Methods for Estimating the Number of True Null Hypotheses in Multiplicity Testing , 2003, Journal of biopharmaceutical statistics.

[10]  Michael E O'Neill,et al.  Levene Tests of Homogeneity of Variance for General Block and Treatment Designs , 2002, Biometrics.

[11]  David B. Allison,et al.  Power and sample size estimation in high dimensional biology , 2004 .

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Cheng Cheng,et al.  Statistical Development and Evaluation of Microarray Gene Expression Data Filters , 2005, J. Comput. Biol..

[14]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[15]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[16]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[17]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[18]  Sue-Jane Wang,et al.  Sample size for gene expression microarray experiments , 2005, Bioinform..

[19]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[20]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[21]  Chen-An Tsai,et al.  Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data , 2003, Biometrics.

[22]  P. O'Brien,et al.  A test for randomness. , 1976, Biometrics.

[23]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Allison,et al.  Towards sound epistemological foundations of statistical methods for high-dimensional biology , 2004, Nature Genetics.

[25]  Sin-Ho Jung,et al.  Sample size calculation for multiple testing in microarray data analysis. , 2005, Biostatistics.

[26]  Robert L. Mason,et al.  Statistical Design and Analysis of Experiments , 2003 .

[27]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[28]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[29]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[30]  Fred A. Wright,et al.  Practical FDR-based sample size calculations in microarray experiments , 2005, Bioinform..

[31]  Els Goetghebeur,et al.  Analysing compliance in clinical trials. , 2005, Statistical methods in medical research.

[32]  Weichung Joe Shih,et al.  A mixture model for estimating the local false discovery rate in DNA microarray analysis , 2004, Bioinform..

[33]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[34]  P J Dyck,et al.  A runs test based on run lengths. , 1985, Biometrics.

[35]  W. Pan,et al.  How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach , 2002, Genome Biology.

[36]  David B. Allison,et al.  Randomization tests for small samples: an application for genetic expression data , 2003 .

[37]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[38]  L. Wasserman,et al.  Operating characteristics and extensions of the false discovery rate procedure , 2002 .

[39]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[40]  M. Radmacher,et al.  Design of studies using DNA microarrays , 2002, Genetic epidemiology.

[41]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[42]  M. Wand Data-Based Choice of Histogram Bin Width , 1997 .

[43]  Jeffrey S. Morris,et al.  Pooling Information Across Different Studies and Oligonucleotide Chip Types to Identify Prognostic Genes for Lung Cancer , 2005 .

[44]  Cheng Cheng,et al.  Improving false discovery rate estimation , 2004, Bioinform..

[45]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[46]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[47]  John D. Storey A direct approach to false discovery rates , 2002 .