A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories

BackgroundIn complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems.ResultsWe adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations.ConclusionsThe framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods.

[1]  Wenguang Sun,et al.  Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments , 2011 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  S. Sarkar,et al.  The Simes Method for Multiple Hypothesis Testing with Positively Dependent Test Statistics , 1997 .

[4]  Y. Benjamini,et al.  Screening for Partial Conjunction Hypotheses , 2008, Biometrics.

[5]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[6]  H. Finner,et al.  Stepwise multiple test procedures and control of directional errors , 1999 .

[7]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[8]  Gregory R. Grant,et al.  A flexible two-stage procedure for identifying gene sets that are differentially expressed , 2009, Bioinform..

[9]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[10]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[11]  K. Coser,et al.  Global analysis of ligand sensitivity of estrogen inducible and suppressible genes in MCF7/BUS breast cancer cells by DNA microarray , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Wenge Guo,et al.  Controlling False Discoveries in Multidimensional Directional Decisions, with Applications to Gene Expression Data on Ordered Categories , 2010, Biometrics.