Resampling-Based Multiple Hypothesis Testing with Applications to Genomics: New Developments in the R/Bioconductor Package multtest

The multtest package is a standard Bioconductor package containing a suite of functions useful for executing, summarizing, and displaying the results from a wide variety of multiple testing procedures (MTPs). In addition to many popular MTPs, the central methodological focus of the multtest package is the implementation of powerful joint multiple testing procedures. Joint MTPs are able to account for the dependencies between test statistics by effectively making use of (estimates of) the test statistics joint null distribution. To this end, two additional bootstrap-based estimates of the test statistics joint null distribution have been developed for use in the package. For asymptotically linear estimators involving single-parameter hypotheses (such as tests of means, regression parameters, and correlation parameters using t-statistics), a computationally efficient joint null distribution estimate based on influence curves is now also available. New MTPs implemented in multtest include marginal adaptive procedures for control of the false discovery rate (FDR) as well as empirical Bayes joint MTPs which can control any Type I error rate defined as a function of the numbers of false positives and true positives. Examples of such error rates include, among others, the familywise error rate and the FDR. S4 methods are available for objects of the new class EBMTP, and particular attention has been given to reducing the need for repeated resampling between function calls.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[3]  S. Dudoit,et al.  Joint Multiple Testing Procedures for Graphical Model Selection with Applications to Biological Networks , 2009 .

[4]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[5]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[6]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[7]  Alan E. Hubbard,et al.  Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives. , 2005, Statistical applications in genetics and molecular biology.

[8]  E. Spjøtvoll,et al.  Plots of P-values to evaluate many tests simultaneously , 1982 .

[9]  Taylor Sandra,et al.  Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values. , 2009 .

[10]  Robert Gentleman,et al.  Differential expression with the Bioconductor Project , 2005 .

[11]  John D. Storey A direct approach to false discovery rates , 2002 .

[12]  Mark J. van der Laan,et al.  Choice of a null distribution in resampling-based multiple testing , 2004 .

[13]  Sandra Taylor,et al.  Hypothesis tests for point-mass mixture data with application to 'omics data with many zero values. , 2009, Statistical applications in genetics and molecular biology.

[14]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[15]  Y. Benjamini,et al.  Adaptive linear step-up procedures that control the false discovery rate , 2006 .

[16]  Sandrine Dudoit,et al.  Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study. , 2008, Biometrical journal. Biometrische Zeitschrift.

[17]  Y. Benjamini,et al.  Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics , 1999 .

[18]  R. Doerge,et al.  Empirical threshold values for quantitative trait mapping. , 1994, Genetics.

[19]  M. J. van der Laan,et al.  Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives , 2004, Statistical applications in genetics and molecular biology.

[20]  Sandrine Dudoit,et al.  Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates , 2004, Statistical applications in genetics and molecular biology.

[21]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Y. Benjamini,et al.  On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics , 2000 .

[24]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Sandrine Dudoit,et al.  Multiple Testing Procedures and Applications to Genomics , 2004 .

[26]  Alan E. Hubbard,et al.  Statistical Applications in Genetics and Molecular Biology Quantile-Function Based Null Distribution in Resampling Based Multiple Testing , 2011 .

[27]  Sandrine Dudoit,et al.  Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate , 2004, Statistical applications in genetics and molecular biology.

[28]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[29]  R. Gentleman,et al.  Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. , 2004, Blood.

[30]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .