Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data

We define a general statistical framework for multiple hypothesis testing and show that the correct null distribution for the test statistics is obtained by projecting the true distribution of the test statistics onto the space of mean zero distributions. For common choices of test statistics (based on an asymptotically linear parameter estimator), this distribution is asymptotically multivariate normal with mean zero and the covariance of the vector influence curve for the parameter estimator. This test statistic null distribution can be estimated by applying the non-parametric or parametric bootstrap to correctly centered test statistics. We prove that this bootstrap estimated null distribution provides asymptotic control of most type I error rates. We show that obtaining a test statistic null distribution from a data null distribution, e.g. projecting the data generating distribution onto the space of all distributions satisfying the complete null), only provides the correct test statistic null distribution if the covariance of the vector influence curve is the same under the data null distribution as under the true data distribution. This condition is a weak version of the subset pivotality condition. We show that our multiple testing methodology controlling type I error is equivalent to constructing an error-specific confidence region for the true parameter and checking if it contains the hypothesized value. We also study the two sample problem and show that the permutation distribution produces an asymptotically correct null distribution if (i) the sample sizes are equal or (ii) the populations have the same covariance structure. We include a discussion of the application of multiple testing to gene expression data, where the dimension typically far exceeds the sample size. An analysis of a cancer gene expression data set illustrates the methodology.

[1]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[2]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[3]  R. Beran Prepivoting Test Statistics: A Bootstrap View of Asymptotic Refinements , 1988 .

[4]  A. Tamhane,et al.  Multiple Comparison Procedures , 1989 .

[5]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[6]  P. Hall The Bootstrap and Edgeworth Expansion , 1992 .

[7]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[8]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[9]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[10]  P. Gänssler Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner. , 1997 .

[11]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[12]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[13]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  M J van der Laan,et al.  Gene expression analysis with the parametric bootstrap. , 2001, Biostatistics.

[15]  Mark J. van der Laan,et al.  Hybrid Clustering of Gene Expression Data with Visualization and the Bootstrap , 2001 .

[16]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[17]  M. J. van der Laan,et al.  Statistical inference for simultaneous clustering of gene expression data. , 2002, Mathematical biosciences.

[18]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[19]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[20]  H. Keselman,et al.  Multiple Comparison Procedures , 2005 .