Resampling-based Multiple Testing for Microarray Data Analysis

The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. Westfall and Young (1993) propose resampling-based p-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of Westfall and Young (1993) and (b) the false discovery rate developed by Benjamini and Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002a), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control family-wise error rate. Adjusted p-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.

[1]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[2]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[3]  O. J. Dunn Estimation of the Means of Dependent Variables , 1958 .

[4]  Z. Šidák Rectangular Confidence Regions for the Means of Multivariate Normal Distributions , 1967 .

[5]  P. Seeger A Note on a Method for the Analysis of Significances en masse , 1968 .

[6]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[7]  K. Gabriel,et al.  On closed testing procedures with special reference to ordered analysis of variance , 1976 .

[8]  K. Jogdeo,et al.  Association and Probability Inequalities , 1977 .

[9]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[10]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[11]  Rudolf Beran,et al.  Balanced Simultaneous Confidence Sets , 1988 .

[12]  B. Sorić Statistical “Discoveries” and Effect-Size Estimation , 1989 .

[13]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[14]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[17]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[18]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[19]  P. Westfall,et al.  Multiple Tests with Discrete Distributions , 1997 .

[20]  P H Westfall,et al.  Using prior information to allocate significance levels for multiple endpoints. , 1998, Statistics in medicine.

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[23]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Ash A. Alizadeh,et al.  Genome-wide analysis of DNA copy-number changes using cDNA microarrays , 1999, Nature Genetics.

[25]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[26]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[27]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[28]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[29]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[30]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[31]  Gregory R. Grant,et al.  Generation of patterns from gene expression data by assigning confidence to differentially expressed genes , 2000, Bioinform..

[32]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[33]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Christopher R. Genovese,et al.  Operating Characteristics and Extensions of the FDR Procedure , 2001 .

[35]  H. Finner,et al.  On the False Discovery Rate and Expected Type I Errors , 2001 .

[36]  F. Pesarin Multivariate Permutation Tests : With Applications in Biostatistics , 2001 .

[37]  Peter H. Westfall,et al.  Using Priors to Improve Multiple Animal Carcinogenicity Tests , 2001 .

[38]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[39]  John D. Storey A direct approach to false discovery rates , 2002 .

[40]  Ash A. Alizadeh,et al.  Stereotyped and specific gene expression programs in human innate immune responses to bacteria , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Dmitri V Zaykin,et al.  Multiple tests for genetic effects in association studies. , 2002, Methods in molecular biology.

[42]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[43]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[44]  Henry Braun,et al.  John W. Tukey's contributions to multiple comparisons , 2002 .

[45]  S. Dudoit,et al.  Resampling-based multiple testing for microarray data analysis , 2003 .

[46]  C M Kendziorski,et al.  On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles , 2003, Statistics in medicine.

[47]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[48]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[49]  P. Müller,et al.  Optimal Sample Size for Multiple Testing , 2004 .

[50]  C. Robert,et al.  Optimal Sample Size for Multiple Testing : the Case of Gene Expression Mi roarraysPeter , 2004 .

[51]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[52]  P. Westfall,et al.  Weighted FWE-controlling methods in high-dimensional situations , 2004 .

[53]  R. Simon,et al.  Controlling the number of false discoveries: application to high-dimensional genomic data , 2004 .

[54]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .